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ABSTRACT 

We present an optical group catalog between 0.1 ^ z <1 based on 16,500 high-quality spectroscopic 
redshifts in the completed zCOSMOS-briglit survey. The catalog published herein contains 1498 
groups in total and 192 groups with more than five observed members. The catalog includes both group 
properties and the identification of the member galaxies. Based on mock catalogs, the completeness 
and purity of groups with three and more members should be both about 83% with respect to all 
groups that should have been detectable within the survey, and more than 75% of the groups should 
exhibit a one-to-one correspondence to the "real" groups. Particularly at high redshift, there are 
apparently more galaxies in groups in the COSMOS field than expected from mock catalogs. We 
detect clear evidence for the growth of cosmic structure over the last seven billion years in the sense 
that the fraction of galaxies that are found in groups (in volume-limited samples) increases significantly 
with cosmic time. In the second part of the paper, we develop a method for associating galaxies that 
have only photo-z to our spectroscopieally identified groups. We show that this leads to improved 
definition of group centers, improved identification of the most massive galaxies in the groups, and 
improved identification of central and satellite galaxies, where we define the former to be galaxies 
at the minimum of the gravitational potential wells. Subsamples of centrals and satellites in the 
groups can be defined with purities up to 80%, while a straight binary classification of all group and 
non-group galaxies into centrals and satellites achieves purities of 85% and 75%, respectively, for the 
spectroscopic sample. 

Key words: catalogs - cosmology: observations - galaxies: groups and clusters: general ~ galaxies: 
evolution ~ large-scale structure of universe - methods: data analysis 
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1. INTRODUCTION 

Galaxy groups are gravitationally bound systems that 
contain multiple galaxies inhabiting the same dark mat- 
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ter (DM) halo. They are of interest for two main reasons. 
First, by regarding them as DM halos they can serve as 
cosmological probes. The number density and cluster- 
ing of groups for a given halo mass and cosmic epoch 
depend on the underlying cosmology. Second, galaxy 
groups constitute an environment for galaxies which is 
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special compared with the general field. The enhanced 
proximity of other galaxies and the presence of an intra- 
group medium may produce distinct evolutionary pro- 
cesses in groups such as enhanced merging rates (Spitzer 
& Baade 1951), galaxy harassment (Moore et al. 1996), 
ram pressure stripping (Gunn & Gott 1972), or strangu- 
lation (Balogh & Morris 2000) which may be significant 
for the general evolution of galaxies, and particularly the 
environmental differentiation of the galaxy population 
(e.g. Weinmann et al. 2006; Gerke et al. 2007; lovino 
et al. 2010; Kovac et al. 2010; Peng et al. 2010). A dif- 
ference between central and satellite galaxies in groups is 
now an established part of our view of galaxy evolution 
(e.g. van den Bosch et al. 2008; Skibba 2009; Pasquali 
et al. 2010; Skibba et al. 2011; Peng et al. 2011). A key 
requirement for both areas is the availability of large, 
high quality group catalogs. 

There are several desired properties for a group cat- 
alog. Purity and completeness are two often conflicting 
requirements — completeness is the fraction of real groups 
that are recovered, while purity reflects the reality of the 
claimed groups. Once the groups are identified, one can 
further define purity and completeness for the member- 
ship of individual galaxies in these groups. The optimiza- 
tion between completeness and purity will often depend 
on the application: high purity catalogs covering a large 
redshift range enable studies of galaxy evolution in differ- 
ent environments over cosmic time. On the other hand, 
having complete catalogs which trace the numbers of real 
groups and provide reliable mass estimates for individual 
groups is important for cosmological studies. The esti- 
mation of reliable masses for individual groups in turn 
requires a high degree of one-to-one correspondences be- 
tween reconstructed and real groups. Precise estimates 
for the group centers is needed for stacking analyses of 
X-ray properties or detection of the weak lensing sig- 
nal, while studying the differences between "central" and 
"satellite" galaxies requires complete group populations 
down to a given flux limit since otherwise the central 
galaxy cannot be reliably identified. 

In this paper wc present a new group catalog produced 
with the zCOSMOS-bright survey (Lilly et al. 2007), 
which now contains about 16,500 high quality spectro- 
scopic galaxies with Iab < 22.5 in the redshift range 
0.1 < z < 1.2 (the "20k sample"). zCOSMOS-bright cov- 
ers the - 1.7 deg2 of the COSMOS field (Scoville et al. 
2007b) which was fully observed by the Hubble Space 
Telescope (Scoville et al. 2007a; Koekemoer et al. 2007) 
down to Iab < 28 (5(t) and followed up in more than 
30 bands by several telescopes from radio to X-ray wave- 
lengths (Capak et al. 2007). This unique combination of 
observational data on a single field makes the COSMOS 
field very suitable for studying the properties of groups 
as a function of redshift and the evolution of galaxies 
in different environment. The large numbers of wave- 
length bands also allows the production of high quality 
photometric redshifts ("photo-z") with an accuracy of 
Sz - 0.01(1 + z) (e.g. Ilbert et al. 2009) for the brighter 
galaxies, allowing the possibility of using these to sup- 
plement the spectroscopic redshifts and assign, at least 
probabilistically, group membership to these galaxies. 

The first major data release of zCOSMOS entailed 
about 8,500 spectroscopic galaxy redshifts (the "10k 



sample" , Lilly et al. 2009) and was used to produce a first 
optical group catalog in the redshift range 0.1 < z < 1 
(Knobel et al. 2009, "K09"). In that paper we dis- 
cussed in detail the group-finding methods and basic 
properties of the "10k group catalog" . We adopted two 
group-finding algorithms, friends-of-friends (EOF) and a 
Voronoi-Delaunay method (VDM), and compared their 
performances on simulated mock galaxy samples. We 
introduced a "multi-run scheme" in which we succes- 
sively used different group-finding parameters, optimized 
for different richness groups, where by richness we al- 
ways refer to the number N of observed spectroscopic 
members. By initially tuning the parameters to detect 
only the richest groups, and then ignoring the subsequent 
fragmentation of these into smaller groups when the pa- 
rameters were tuned to smaller scales, we could improve 
the statistics of the catalog in terms of completeness and 
purity over a wide range of scales, minimizing the effects 
of fragmentation and overmerging (see K09 for a discus- 
sion). The EOF catalog was used as the basic 10k group 
catalog while the VDM catalog was used to produce sub- 
catalogs with further enhanced purities. The basic 10k 
catalog contained 802 groups in total and 102 groups 
with more than five members. 

The group catalog presented in this paper is created in 
a similar way to that in K09 from the larger sample that 
is now available. However, since it now contains groups 
extending up to 30 members, we had to slightly extend 
the methods to guarantee its high quality over this wider 
range of richness. 

In contrast to the zCOSMOS 10k sample whose com- 
pleteness was only about 30% and for which it would 
have made little sense to use information from photo-z, 
the completeness of the 20k sample now exceeds 50% and 
the photo-z objects become a minority. Thus it becomes 
attractive to try to associate these remaining photo-z ob- 
jects to the spectroscopically identified groups, so that an 
idea of group membership can be obtained for all galaxies 
down to the magnitude limit of the survey. This is useful 
for many scientific goals. We therefore develop a method 
for incorporating the photo-z galaxies into the spectro- 
scopic group population by assigning to each photo-z 
galaxy a probability that it is a member of a given group. 
This probability is based on the projected spatial dis- 
tance of the galaxy from the group center and its photo-z 
relative to the redshift of the group, calibrated against 
mock catalogs. Including the photo-z galaxies enables 
improved estimates of the location of the group center, 
and improved identification of the most massive galaxy 
in the group, and of the galaxy lying at the center of 
the potential well, which we define as the central galaxy. 
For the latter two cases, we can construct various samples 
which represent trades between completeness and purity. 
As a result, we also look into how well we can apply a 
binary central-satellite classification to all galaxies in the 
sample, including those not associated with groups. 

With the final 20k sample we produce a group cata- 
log containing almost 1500 groups in the redshift range 
0-1 ^ ^ S 1- Other major group catalogs at redshift 
z > 0.3 are the one from the DEEP2 survey (Davis et al. 
2003) containing ~2400 groups (Gerke et al. 2005, 2012) 
in the redshift range 0.7 < z < 1.4 and the one from 
VVDS (Le Fevre et al. 2005) containing ~ 300 groups 
in the redshift range 0.2 < z < 1 (Cucciati et al. 2010), 
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so the new zCOSMOS catalog is one of the largest pub- 
lished group catalogs at high redshift (and the largest 
on a contiguous field) and features very good statistics 
compared to the other group catalogs at high redshift 
in the literature. A special feature of our group cata- 
log is the availability of group centers that are based on 
a sophisticated approach and the possibility to produce 
high-purity samples of central and satellite galaxies. 

This paper is organized as follows. In Section 2.1, we 
describe the observational and mock data used for our 
work. In Section 3, we describe the method of group 
identification and the statistical results obtained using 
the mock catalogs. We then give a detailed description 
of the final zCOSMOS spectroscopic group catalog in 
Section 4 and perform some comparisons with the mock 
catalogs. In the second part of the paper, we first de- 
velop, in Section 5, the method for associating photo-z 
galaxies to the spectroscopically identified groups. We 
then discuss in Section 6 how this can lead to improved 
definitions of the corrected richness of the groups, of the 
most massive galaxies, of the spatial centers, and of the 
central galaxies, defined as those at the bottom of the 
potential well. The properties of the centrals and satel- 
lites will be explored in two further papers in prepara- 
tion (C. Knobel et al., in preparation; K. Kovac et al., 
in preparation). In Section 7 we finally comment on the 
general difficulties in producing high quality group cata- 
logs and in Section 8 we conclude the paper. 

In the paper we will frequently make comparison with 
the set of 24 mock catalogs, which are 24 different re- 
alizations of a single model universe. When we apply 
a general algorithm to the mock catalogs, the scatter 
among the 24 returned values represents the minimum 
uncertainty that can be expected when we apply the al- 
gorithm to the actual data, due to issues such as cosmic 
variance. We refer to this as the standard deviation of the 
relevant parameter among the mock catalogs. It is this 
scatter which is appropriate when we wish to consider 
whether the real data are or are not consistent with the 
model universe of the mock catalogs. The best estimate 
of the overall performance of the algorithm in question 
obviously comes from the average of all 24 mock catalogs. 
The uncertainty in this estimate is given by the standard 
deviation above divided by We will refer to this as 

the standard deviation of the mean. 

Where necessary, a concordance cosmology with Hq = 
70 km s~^ Mpc~"^, flm = 0.25, and VIa = 0.75 is applied. 
All magnitudes are quoted in the AB system. We use 
the term "dex" to express the antilogarithm, i.e. 0.1 dex 
corresponds to a factor 10"'^ ~ 1.259. 

2. DATA 

In this section we describe the data that have been used 
for this paper. First, we give an overview of the zCOS- 
MOS survey from which the spectroscopic redshifts are 
taken, then we describe the derivation of the photomet- 
ric redshifts, masses, and absolute magnitudes using the 
photometry of the COSMOS survey, and finally we de- 
scribe the construction of realistic mock galaxy samples. 

2.1. The zCOSMOS survey 

zCOSMOS (Lilly et al. 2007, 2009; S. J. Lilly et al. 
2012, in preparation) is a deep spectroscopic galaxy 



survey on the 1.7 deg^ of the COSMOS field (Scoville 
et al. 2007b) which utilized about 600 hours of ESO 
VLT service mode. It is divided up into two parts, 
"zCOSMOS-bright" and "zCOSMOS-decp" . The former 
covers mainly the redshift range 0.1 ^ z < 1.2 and al- 
most the entire COSMOS field, while the latter aims to 
cover the redshift range 1.5 < z < 3 on the central ^ 1 
deg2 of the COSMOS field. 

The current work is entirely based on zCOSMOS- 
bright, which is now complete and contains spectra for 
about 20,000 objects taken using the VIMOS spectro- 
graph (Le Fevre et al. 2003) with a medium-resolution 
grism. The target catalog consisted basically of all ob- 
jects within the magnitude range 15 < Iab < 22.5. 
Suspected stars were excluded. The slits were assigned 
to the targets such that for each mask the number of 
slit assignments on each of the four VIMOS quadrants 
was maximized — except for some X-ray and radio ob- 
jects which were observed at high priority. Since there 
were two masks per pointing and the pointings were over- 
lapping with centers differing by the size of a quadrant, 
there were finally eight passes for the central field, four 
at the borders, and two at the corners. 

About 2% of all spectra come from "secondary" ob- 
jects, i.e. objects that were potential targets which 
serendipitously ended up in slits targeted at other galax- 
ies. They are not only very helpful for estimating the 
accuracy and verification rate of redshifts, but also com- 
pensate for the bias against close pairs due to slit con- 
straints (de Ravel et al. 2011; Kampczyk et al. 2011). Af- 
ter removing less reliable redshifts (i.e. confidence classes 
0, 1.1, 2.1 and 9.1; see Lilly et al. 2009) and spec- 
troscopic stars, we end up with a high quality redshift 
galaxy sample containing 16,776 objects within the area 
149.47° < a < 150.77° and 1.62° < S < 2.83°. From 
multiply observed objects the spectral verification rate 
for this sample is about 99% and the redshift accuracy 
about 100 km s~^ which is sufficient to probe the cos- 
mic group environment. The remaining objects and all 
those not observed spectroscopically have photo-z avail- 
able. Henceforth we will refer to this sample of secure 
redshifts as the "20k sample" . 

The spatial sampling rate (SSR), i.e. the fraction of ob- 
jects of the magnitude-limited target catalog whose spec- 
tra were observed, is a function of (a, d) and is shown in 
Figure 1. According to the design of zCOSMOS there is a 
central region {a = 150.12 ± 0.54° and 6 = 2.22 ± 0.46°, 
see the black rectangle) with a substantial higher SSR 
than at the borders. Even in the central region, the SSR 
is not completely uniform, exhibiting some stripes due to 
the placement of slits in the masks. The redshift success 
rate (RSR) is the fraction of observed spectra that have 
yielded a reliable redshift. The RSR is mostly a function 
of apparent magnitude and redshift of the galaxies and 
only weakly dependent on color (see Figs. 2 and 3 of Lilly 
et al. 2009). 

Approximately, the SSR and RSR can be assumed to 
be uneorrelated so that by multiplying them we obtain 
for each galaxy the completeness in respect to an ideal 
magnitude-limited survey. The full zCOSMOS area has 
an average completeness of 48%, while for the central 
region it rises to 56%. For some applications it is useful 
to restrict the area of the survey to the central region 
where the sampling rate is highest. It should be noted 
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Figure 1. Spatial sampling rate (SSR) of the zCOSMOS 20k 
sample. The color bar indicates the SSR, which is computed in 
pixels of 1.5 arcmin. The black rectangle shows the central region 
for the 20k sample. 

that the rcdshift distribution of galaxies in the COSMOS 
field shows two prominent features at redshifts ^ 0.35 and 
-0.7 (cf. Fig. 1 of K09). 

2.2. Photometric redshifts 

Photometric redshifts (photo-z), masses, and absolute 
magnitudes were derived from spectral energy distribu- 
tion (SED) fitting using ZEBRA+ (Oesch et al. 2010), 
which is an extension of ZEBRA (Feldmann et al. 2006), 
to allow for the derivation of physical properties of the 
galaxies using stellar population synthesis models. 

The photo-z were derived from a fit of empirical tem- 
plates to 26 photometric bands from u* (CFHT) to 
Spitzer IRAC4.8 including 12 broad-band, 12 interme- 
diate band, and 2 narrow-band filters. The empirical 
template set was based on Bruzual & Chariot (2003) 
models, to which emission lines were added, before run- 
ning the template correction module of ZEBRA based 
on a random subsample of zCOSMOS spectroscopic red- 
shifts. For the few hundred XMM-Newton X-ray sources 
the photo-z provided by Salvato et al. (2009) were taken 
(pubhshcd in Brusa et al. 2010). 

The stellar masses were subsequently derived from 
standard Bruzual & Chariot (2003) models with an ini- 
tial stellar mass function of Chabrier (2003) and dust 
extinction according to Calzetti et al. (2000). Due to 
the absence of emission lines in the model SEDs, only 
the broad-band photometry was used for the SED fit. 
where the rcdshift was fixed at the spec-z of the galaxy, 
if available, or otherwise at the adopted photo-z. 

In order to increase the fidelity of our photo-z sample 
we excluded 5% objects by applying a cut in the resulting 
from the SED fit and required that for each object at 
least nine broad band filters were available. Comparison 
with the spectroscopic control sample yielded a photo- 
z error of about 0.01(1 -I- z) and a catastrophic failure 
rate of 2-3% where a catastrophic failure is defined by 
l^spoc — -Zphotl > 0.04(1 -|- z). (The subsample that was 
excluded had catastrophic failure rate of ~ 60%.) We 



compared our stellar masses to those derived using Hy- 
perzmass (see Bolzonella et al. 2010) which yielded an 
uncertainty in stellar mass of about 0.2 dex. Note that 
the stellar masses were derived without considering mass 
return in the sense that "stellar mass" is simply the inte- 
gral of the star formation rate, since this is more useful 
for most purposes. These masses are typically 0.2 dex 
larger than when considering mass return. 

2.3. Mock catalogs 

The mock catalogs that arc used for tuning the group- 
finding parameters and for comparing our results with 
cosmological simulations arc adapted from the COS- 
MOS mock light cones (Kitzbichler & White 2007) which 
are based on the Millennium DM iV-body simulation 
(Springel et al. 2005) run with the cosmological parame- 
ters n^n = 0.25, JIa = 0.75, rib = 0.045, h = 0.73, n = 1, 
and fTg = 0.9. The semi-analytic recipes for populat- 
ing the DM halos with galaxies are that of Croton et al. 
(2006) as updated by De Lucia & Blaizot (2007). There 
arc 24 independent mock catalogs, each covering an area 
of 1.4 deg X 1.4 deg with an apparent magnitude limit of 
r < 26 and a redshift range of z < 7. 

The mock catalogs were adjusted to resemble as closely 
as possible the actual 20k sample. For details we re- 
fer to K09. After applying a magnitude cut the mean 
number of galaxies in the mock catalogs (averaged over 
all 24 fields) are slightly different from the number of 
galaxies in the zCOSMOS target catalog (a 1(t-2o' ef- 
fect). Since the density of galaxies is important for tun- 
ing the group-finding parameters, we applied a small ad- 
justment, uniform across all mock catalogs and smoothly 
varying in redshift, to the magnitude limit for the mock 
catalogs so as to match the correct (smoothed) number 
of galaxies with redshift. This intervention has, however, 
only a very small effect on the analysis in this paper 
and wc usually checked that our results did not depend 
sensitively on this alteration. Wc then applied the SSR 
and RSR to the mock catalogs by randomly removing 
galaxies from the magnitude-limited mock sample and 
implemented a Gaussian redshift measurement error of 
(5z= 100(1 + z)/c km s'^ 

For the second part of the paper, we extend the spec- 
troscopic 20k mock samples by adding simulated photo- 
z galaxies so that the spec-z and photo-z mock samples 
add up to the /ab < 22.5 complete samples for each mock 
catalog. That is, each galaxy brighter than the flux limit 
that is not part of the spec-z mock sample was assigned 
a photometric redshift by perturbing its original redshift 
by an amount drawn from a Gaussian distribution with 
standard deviation 6z = 0.01(1 -I- z). We also perturbed 
the stellar masses of all galaxies, spec-z as well as photo- 
z, by adding a Gaussian random number with standard 
deviation of 0.2 to log(M/M0) to mimic the stellar mass 
uncertainty of 0.2 dex of the actual data. 

3. GROUP-FINDING METHOD 

In this section, we describe the method of group iden- 
tification and provide the resulting group catalog statis- 
tics as obtained with the mock catalogs. We will slightly 
modify the methods presented in K09 to optimize them 
for the 20k sample. A novelty of the 20k group catalog is 
the existence of a larger number of relatively rich groups 
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Figure 2. Fraction of detectable halos in the zCOSMOS 20k mock 
samples, as a function of redsiiift, wiiere detectable corresponds to 
iiaving at least two members with spectroscopic redshifts above 
/ab = 22.5 after the spacial sampling and spectroscopic success 
rate are applied. The lines (from bottom to top) correspond to 
groups more massive than 11, 11.5, 12, 12.5, 13, 13.5, and 14, re- 
spectively, in units of log(M/MQ). The shaded area is the standard 
deviation among the 24 mock catalogs. 

with N > 12, SO that the optimization strategy has to 
be adapted to yield stable statistics for these higher rich- 
ness classes as well. The application to the zCOSMOS 
20k sample is presented in the next section. 

3.1. Definitions 

We will mainly follow the terminology and statistics 
introduced in Section 3.2 of K09 which shall be briefly 
summarized in the following. For details we refer to K09. 

A group is defined as the set of galaxies occupying the 
same DM halo.^^ In the mock catalogs we know ex- 
actly which galaxies are in which groups and we denote 
the corresponding sets of galaxies as the "real groups". 
On the other hand, the set of groups obtained by run- 
ning a groupfindcr on actual or mock data arc called 
"reconstructed groups" . The aim of group-finding is to 
tune the parameters of the groupfinder so that the re- 
sulting catalog of reconstructed groups approaches as 
closely as possible the catalog of real groups, as mea- 
sured by certain statistics. It should be stressed that 
the "real" groups correspond to those DM halos which 
would be "detectable" (i.e. which host at least two galax- 
ies with spectroscopic redshift measurements) in a galaxy 
survey with the same characteristics as zCOSMOS. Fig- 
ure 2 shows the fraction of these detectable DM halos as 
compared to the overall sample of all DM halos. Note 
that more than 90% of DM halos of mass > lO"'^ Mq 
are detectable up to a redshift of z ~ 0.8, while, for 
groups more massive than 10^^'^ Mq, the completeness 
decreases linearly with redshift from ~ 90% at z ~ 
down to -10% at z ~ 0.9. 

With the concepts of the "real" and "reconstructed" 

Since the groupfinder are calibrated using the mock catalogs, 
the definition of a DM halo used in this paper corresponds to the 
operational definition of a DM halo in the Millennium simulation. 
That is, a DM halo is a friends-of-friends group of DM particles 
with a linking length of = 0.2. These groups ideally correspond 
to structures with a mean overdensity of roughly 200. 



groups we can define the "completenesses" and "purities" 
of samples of reconstructed groups by associating the real 
groups to reconstructed groups and vice versa. A real 
(reconstructed) group is associated to a reconstructed 
(real) group if the former contains more than 50% of the 
members of the latter. All such associations are called 
"one-way-match" (IWM). If the association is mutual 
then we call it also a "two-way-match" (2WM; see Fig. 3 
of K09 for illustration). 2WM are thus IWM. 

In K09 we demonstrated that the statistics of the group 
catalog can strongly depend on the richness N, which 
is the number of observed spectroscopic members for a 
given group, and we introduced the multi-run scheme to 
overcome this. To check this aspect of our catalog, we in- 
vestigate the statistics as a function of N in what follows. 
It should be noted that N will be biased with respect to 
redshift, since it refers to galaxies above the survey flux 
limit, and it is also affected by the local sampling rate. 
Hence, the richness is a parameter that describes the 
identification of a group, and the amount of information 
about it, rather than the actual number of galaxies that 
reside in it. To obtain an estimate of the actual number 
of members, unbiased with respect to redshift, the cor- 
rected richness (see Sects. 4.2 and 6.1) should be used 
defined in terms of a volume-limited galaxy sample. 

The one-way completeness ci ( A^ ) is then defined as the 
number of IWM of real groups of richness A^ to recon- 
structed groups of any richness divided by the number of 
real groups of richness A^. Note that in K09 wc defined 
these quantities in a cumulative way, i.e. always for > A^, 
here we define them as functions of A^ only. The two- 
way completeness C2{N) is similarly defined by consider- 
ing 2WM instead of IWM. Similarly, the one-way purity 
Pi (N) is defined by the number of IWM of reconstructed 
groups of richness A^ to real groups of any richness nor- 
malized by the number of reconstructed groups of rich- 
ness N, and the two-way completeness P2iN) is obtained 
by exchanging IWM with 2WM. While these statistics 
arc made on a group-by-group basis, there are analogous 
statistics, referring to individual galaxy memberships in 
groups, which are the galaxy success rate Sgia{N) in cor- 
rectly assigning group membership to galaxies, and the 
interloper fraction fi{N) which gives the fraction of non- 
group galaxies that are incorrectly assigned to groups. 

In addition to these statistics we also introduced in 
K09 the figures of merit gi and g2- 



g^{N)^^il-c,{N))^ + il-p,{N))^ (1) 

C2iN) P2{N) 



ci{N) pi{N) 



(2) 



They are defined such that they are numbers in the in- 
terval between and 1. 51(A) is a measure of the bal- 
ance (or trade-off) between IWM completeness and pu- 
rity and 32 ( A'^) is a measure of the balance between frag- 
mentation and overmerging of reconstructed groups. For 
a good group catalog gi should be close to zero and g2 
close to one for all ranges of richnesses. In this paper, we 
introduce another figure of merit 



9iiN) = Vi^-C2{N)r + {l-P2{N)y 



(3) 



which is similar to gi except that all IWM statistics are 
replaced by their 2WM statistic counterparts. 
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We remind readers that these statistics compare the 
reconstructed group catalog to the real group catalog, 
i.e. to the groups that arc in principle detectable within 
zCOSMOS. 

3.2. Optimization strategy 

The basic group-finding algorithms we apply are the 
FOF and VDM algorithms that were described in Sec- 
tion 3.1 of K09. The main task is to optimize the group- 
finding parameters such that the resulting catalog ex- 
hibits the best possible statistics. For the 10k sample 
the group-finding strategy was mainly driven by mini- 
mizing gi{N) for several richness classes. However, since 
gi{N) is only based on IWM statistics, it does not ac- 
count for fragmentation or overmerging in the resulting 
catalog. Thus, if optimized for gi{N) the resulting cata- 
log might contain, unnecessarily, many such ovcrmcrgcd 
or over-fragmented groups which will exhibit very good 
one-way statistics but very poor two-way statistics. A re- 
constructed group that is fragmented or overmerged will 
fail to tell us anything about the true nature of the group 
such as its mass, richness, or radius. It will only tell us if 
a certain galaxy is a group galaxy or not. Therefore, the 
number of such groups should be kept as low as possible. 
This is why we decided in the present work to optimize 
the parameters for the modified gi{N) instead of gi{N). 

Optimizing the single-richness runs in respect to gi in- 
stead of gi will, of course, yield slightly worse gi values 
for the single runs. This, however, docs not have to be 
true for the gi{N) statistics of the global multi-run cata- 
log. The combination of several single runs with inferior 
gi statistics can lead to a multi-run catalog with slightly 
superior 171 for small N than the multi-run catalog of the 
single gi-optimizcd single runs. This seeming paradox is 
resolved by noting that, in a multi-run scheme, the single 
runs can interfere in a complicated nontrivial way. For 
instance, if the first run being optimized for large groups 
aims to produce a very complete catalog, it will lead to 
some overmerging of some parts of small groups, which 
cannot then be detected in later runs. As a result, the 
first run can already spoil the gi statistics of the small 
groups. 

How can the parameters of the single runs be optimized 
in order to produce an optimal multi-run catalog? This 
is probably the most difficult part in the overall group- 
finding procedure and, unfortunately, there is no general 
prescription in order to produce "the" unique optimal 
multi-run catalog. In principle, one would have to ana- 
lyze the statistics of the multi-run catalog for all possible 
parameter combinations of the single runs. This would 
not only be computationally very expensive, but would 
also require a distinct single figure of merit for charac- 
terizing a whole catalog. ^'^ 

Thus, a manageable way of producing an optimized 
multi-run catalog is to first produce a couple of optimized 
single runs and then try different combinations always 
keeping an eye on gi (N) and the number of reconstructed 
groups Nrcc{N). As a guideline the parameters of the 

It is unlikely that such a single optimal figure of merit exists. 
For instance, the optimal catalog in respect to gi over the whole 
range of group sizes is not necessarily also the optimal catalog in 
respect to the produced number of reconstructed groups A'^r 

cc , since 

we found that almost equally good catalogs in respect to §1 can 
exhibit substantial differences in Afjcc- 



Table 1 

Multi-run parameter sets for FOF 



Step 


N ■ 

^ ^min 


max 


b 


^max 

(Mpc)=' 


R 


1 


11 


500 


0.1 


0.375 


18.5 


2 


7 


10 


0.095 


0.38 


14.5 


3 


6 


6 


0.09 


0.35 


16 


4 


5 


5 


0.085 


0.375 


13.5 


5 


4 


4 


0.075 


0.3 


19.5 


6 


3 


3 


0.09 


0.275 


18.5 


7 


2 


2 


0.06 


0.225 


16.5 



Physical length. 



single runs which arc to be combined to the multi-run 
should not exhibit any large discontinuities as a function 
of richness. That is, the parameters of the multi-run 
should be slowly varying as we move down to smaller 
and smaller groups. 

While this approach works pretty well for FOF, it is 
less convenient for the VDM parameters because their 
effect on the final catalog statistics is much harder to 
anticipate intuitively and it is even harder to anticipate 
the effect of different combinations of single runs. The 
final parameter sets for the FOF and VDM 20k multi- 
run catalogs are given in Table 1 and 2, respectively. 
Note that the justification for these particular parameter- 
sets is based only on the extremely good statistics of 
the final product (see Sect. 3.4) and not by any rigorous 
optimization procedure. Moreover, we have also checked 
that the application of these group-finding parameters 
on the actual data yield consistent behavior between the 
actual data and the mock catalogs, e.g. in the number of 
IWM between FOF and VDM (cf. Fig. 7 of K09). 

Looking at Figure 1 one might be tempted to intro- 
duce a spatially variable linking length to account for 
the variations in the projected density of galaxies caused 
by variations in the SSR. We carried out tests of this 
by implementing, for example, sinusoidally varying link- 
ing lengths along the right ascension axis that produced 
slightly larger values in underdense strips than in the 
overdense strips in Figure 1. Interestingly, our optimiza- 
tion scheme preferred a non-varying linking length. The 
reason for this is that, except at low redshifts z < 0.3, 
the FOF linking length l± is set by the maximum link- 
ing length imax (sec K09), which is introduced to be of 
the order of the expected physical size of the DM ha- 
los, which is thus independent of the local galaxy den- 
sity. This is also demonstrated if we allow a general 
functional form for the redshift dependence of the link- 
ing length l_i_{z). The preferred redshift dependence, in 
terms of optimizing the statistics of the group catalog, 
is a linking length that is basically constant in physical 
space, even though the density of galaxies drastically de- 
creases with redshift. This is also seen in the fact that the 
statistics of the group catalog are very similar whether 
we consider the full COSMOS field or only the central 
region (see Tab. 3). 

As discussed and implemented in K09, a much more 
important effect is that the optimal linking length l± de- 
pends on richness. This motivated our multi-run scheme. 
As shown in Table 1 the linking lengths for the seven dif- 
ferent runs in this scheme differ by up to 50%. 
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Table 2 

Multi-run parameter sets for VDM 



Step 


N ■ 

^ " mill 


^ max 


Ri 
(Mpc) 


Li 
(Mpc) 


Ru 
(Mpc) 


Lii 
(Mpc) 


r 

(Mpc) 


I 

(Mpc) 


1 


9 


500 


0.7 


12 


0.7 


10 


0.7 


10 


2 


5 


8 


0.7 


12 


0.4 


8 


0.5 


8 


3 


2 


4 


0.4 


8 


0.4 


8 


0.5 


7 



Note. All units of lengths are comoving. 



3.3. Subcatalogs 

As in K09, wc take the FOF multi-run group catalog to 
be the main group catalog and use the VDM multi-run 
catalog to define the galaxy purity parameter, GAP^, for 
i G {1, 2} as follows: if an FOF group galaxy is also in a 
VDM group such that there is a IWM between the FOF 
and the VDM group, the GAPi of this galaxy is set to 1, 
and to otherwise. Similarly, if there is a 2WM between 
these groups, then the GAP2 is 1, and otherwise. 

This concept can be generalized to a group as a whole 
by computing the fraction of members of a given group 
that have a GAP 7^ 0. We define the group purity pa- 
rameter, GRP.;, i ~ {1, 2} of a group to be the fraction of 
galaxies in that group that have GAP^ = 1. By selecting 
those groups with a GRP^ larger than some threshold, 
we generate subcatalogs of the original FOF group cata- 
log with higher purity, as shown in the next paragraph. 
The subcatalog consisting of all groups with GRP,; > 
excludes groups that are only detected in FOF. We call 
this the GRPi subcatalog. 

It turns out that the statistics of the basic FOF catalog 
and its GRPi subcatalog are very similar. Consequently 
we omit the latter in the following, including instead just 
the GRP2 subcatalog. 

3.4. Catalog statistics for the mock catalogs 

The global properties of the 20k mock group catalogs 
are summarized in Table 3 and in Figures 3-6. If the 
pairs are excluded, the full catalogs exhibit a complete- 
ness ci > 83% and a purity pi > 83% for any richness. 
If we restrict the sample to the central region and to 
the redshift range 0.1 < z < 0.8, where most groups are, 
the completeness for these groups even rises to ci > 85%, 
while the purity remains about the same as before. 

In Figure 3 the cumulative statistics of the 20k FOF 
mock catalogs are shown (red line) and compared those 
of the 10k mock catalogs (black line) and the 20k GRP2 
mock subcatalogs (green line), i.e. all groups with a 
GRP2 > 0. From the gi-panel it is clear that compared 
to the 10k catalogs the 20k catalogs constitute an im- 
provement of about 5%-10% which is significant in terms 
of the statistical error of the mean of the 24 mock cata- 
logs which presumably refiects the range of things (such 

■^^ As an aside, the GRPi catalogs are similar but not identical 
to what we called the iWM, i = {1,2} subcatalogs in K09. The 
iWM catalogs contained not only a subsample of groups of the 
basic FOF catalog, but also a subsample of the members of each 
group so that the richness of a group of the iWM catalog was in 
general not the same like that of the corresponding FOF group. For 
the groups of the GRP^ catalogs, the richness is always the same. 
In this paper, we will never use the term iWM in the meaning of 
subcatalogs as in K09, but only to indicate the relation between 
reconstructed and real groups. 



Table 3 

Statistics of the 20k mock group catalogs for different 
observed richness ranges N 



N = 2 3<Af<4 5<Af<9 N > 10 



Full field and full redshift range 

ci 0.69 ±0.02 0.84 ±0.03 0.83 ± 0.04 0.84 ±0.06 

C2 0.62 ±0.02 0.76 ±0.03 0.77 ±0.04 0.80 ± 0.08 

pi 0.69 ±0.02 0.82 ±0.02 0.83 ± 0.04 0.84 ± 0.06 

P2 0.63 ±0.02 0.74 ±0.03 0.75 ± 0.04 0.78 ± 0.06 

Sgai 0.70 ±0.02 0.80 ±0.02 0.84 ± 0.02 0.87 ±0.02 

/i 0.30 ±0.02 0.22 ±0.02 0.17 ±0.02 0.15 ±0.02 



Central region and 0.1 < z < 0.8 

ci 0.72 ±0.02 0.85 ±0.03 0.84 ± 0.05 0.86 ± 0.06 

C2 0.65 ±0.02 0.78 ±0.04 0.78 ± 0.05 0.81 ± 0.06 

pi 0.72 ±0.02 0.82 ±0.03 0.85 ± 0.04 0.83 ± 0.05 

P2 0.64 ±0.02 0.73 ±0.03 0.77 ±0.04 0.78 ± 0.07 

5gai 0.73 ±0.02 0.81 ±0.03 0.85 ± 0.03 0.88 ± 0.02 

Ji 0.27 ±0.02 0.22 ±0.02 0.16 ±0.02 0.14 ±0.02 

Note. The numbers refer to the mean and the error bars 
to the standard deviation among the 24 mock catalogs. 

as overapping groups, spatial distribution of galaxies in 
groups etc.) which can influence the purity and com- 
pleteness. This superiority is less obvious from a glance 
at the completeness and the purity (middle panels). For 
TV > 10 the completeness of the catalogs, both ci and C2, 
are similar while the purity of the 20k catalogs is slightly 
higher and the overall line is much more uniform over a 
broad range for N. For N < 10 the completeness of the 
10k catalogs is higher, but this deficiency is more than 
balanced by the improved purity of the 20k catalogs. The 
trends of the galaxy success rate S'gai and the interloper 
fraction /i are similar between the 10k and 20k. The 20k 
catalogs have significantly less interlopers for all N. 

Overall, the 20k mock group catalogs are generally 
purer than the 10k mock catalogs. In fact, they are so 
pure that, as already noted above, there is almost no dif- 
ference between the FOF and the corresponding GRPi 
subcatalogs. As expected, the GRP2 catalogs arc even 
purer than the FOF ones, but at the expense of com- 
pleteness. While the gi goodness of the GRP2 catalog 
is worse than that of the FOF catalogs, the g2 good- 
ness is better for groups N < 10. Thus, selecting only 
the GRP2 groups slightly diminishes the contamination 
from overmerging and fragmentation. 

The catalog statistics function of redshift are 

shown in Figure 4 for different richness classes. It is clear 
that all statistics are fairly robust over the whole redshift 
range for any richness of group, as was already demon- 
strated for the 10k sample (cf. Fig. 9 of K09). Only at 
the very high redshift end z > 0.8 and for the smallest 
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Figure 3. Cumulative statistics of the mock group catalogs as 
a function of observed richness TV. The mean for the 20k FOF 
group catalogs is shown by the red lines, for the 20k GRP2 group 
catalogs by the green lines, and for the FOF 10k group catalogs by 
the black lines. Upper left panel: the solid lines correspond to gi 
and the dashed lines to gi . Upper right panel: interloper fraction 
fl. Middle panels: the solid lines correspond to ci (left) and pi 
(right) and the dashed lines to C2 and p2, respectively. Lower left 
panel: galaxy success rate Sgai- Lower right panel: goodness 92 • 
In all panels, the error bars refer to the standard deviation of the 
mean. For the sake of clarity they are only shown for the 20k FOF 
catalogs. 

groups is a weak redshift dependence apparent. 

The superiority of the 20k catalogs over the 10k cata- 
logs can, however, only be partially assessed by Figure 3. 
One of its major successes is that the new catalogs cor- 
rectly reproduce the number of groups as a function of 
richness N . Figure 5 shows the relative abundance of re- 
constructed groups A^rcc (lower panel blue line) compared 
to real groups A^reai in the mock catalogs. It is clearly 
seen that the mean number of reconstructed groups fol- 
low extremely well the number of real groups for all N. 
Even the scatter in reconstructed groups among the 24 
mock catalogs is well within the sample variance of the 
real groups. Note that in K09, it was the IWM subcat- 
alogs that had this property, while the basic FOF multi- 
run catalogs contained rather too many groups for small 
TV (see Fig. 6 of K09). 

4. THE SPECTROSCOPIC GROUP CATALOG 

The group catalog produced with the actual zCOS- 
MOS 20k sample is given in Tables 4 and 5. The first 
table provides a list of all groups along with their proper- 




Figure 4. Statistics for the FOF mock group catalogs as function 
of redshift. The four panels show different richness classes as in- 
dicated by the labels. The solid curves indicate the completeness 
ci (blue), purity pi (red), galaxy success rate 5gai (green), and 
the interloper fraction fi (black). The dashed lines correspond to 
C2 (blue) and p2 (red). The error bars are only shown for ci, p2 
and fl for clarity and correspond to the standard deviation among 
the 24 mock catalogs. The robustness of the catalog statistics over 
most of the redshift range is clear. 



ties and the second the corresponding group galaxy sam- 
ple containing the spectroscopic and photometric group 
population. Regarding the construction of the photo- 
metric group population we refer to Section 5. In the 
following, we will call the actual zCOSMOS FOF group 
catalog just the "20k group catalog" . The positions of 
the 20k groups in redshift space are shown in Figure 7. 

The basic properties of the 20k group catalog are sum- 
marized in Table 6 and compared to the 10k catalog. 
For N > 2, the 20k catalog contains 1496 groups, almost 
twice as many as the 10k catalog, while it has four times 
as many groups with > 10. 

4.1. Group robustness 

One of the main prerequisites for estimating the prop- 
erties of reconstructed groups is the fact that the group 
is reliably identified. If the group is overmerged or frag- 
mented, the derived properties such as mass or physical 
size will be severely affected and may have little or noth- 
ing to do with those of the real group. For reconstructed 
groups that do not have a 2WM to their real groups, 
we cannot even, in general, perform a unique one-to-one 
comparison between the properties of real groups and 
those of the reconstructed groups. This again empha- 
sizes the importance for a group catalog to be not only 
optimal in respect to the one-way statistics ci and pi, 
but also regarding the two-way statistics C2 and p2 ■ 

Figure 6 shows the fractions of groups as a function 
of observed richness A^ that have the four different kinds 
of possible associations: full 2WM, a IWM from recon- 
structed to real (i.e. fragmentation), a IWM in the op- 
posite direction (i.e. overmerging) , and no association at 
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Table 4 

The zCOSMOS 20k group catalog (excerpt) 



nroun-TD 




N 

J «corr 


(deg) 


Ogr 

(deg) 


Zgr 


^fudgc 

(Mpc) 


cr log 

(km s"-"-) 


f Mfudgc ^ g 
V Mq J 







14 


33 


150.02209 


2.01328 


0.0787 


0.646 


433 


13.51 


0.93 


1 


30 


54 


150.35758 


2.44265 


0.1230 


0.652 


454 


13.56 


0.63 


2 


33 


52 


149.86613 


1.76547 


0.1245 


0.674 


587 


13.52 


0.61 


3 


14 


28 


150.42153 


2.44418 


0.2160 


0.532 


298 


13.45 


1.00 


4 


14 


97 


150.20008 


1.65232 


0.2202 


0.722 


1008 


13.69 


0.93 


5 


17 


36 


150.10545 


2.36170 


0.2201 


0.577 


745 


13.44 


0.94 


6 


20 


27 


150.45635 


2.68079 


0.2186 


0.515 


642 


13.42 


0.95 


7 


17 


28 


150.04641 


2.43245 


0.2200 


0.532 


662 


13.40 


0.71 


8 


15 


16 


150.23142 


2.55729 


0.2199 


0.627 


418 


13.49 


0.87 


Note. This table is available in its entirety in 
here for guidance regarding its form and content. 


a machine-readable form in 


the online journal. A portion 


is shown 



^ Number of spectroscopic members. 

Corrected richness with respect to the flux limit (see Sect. 6.1). 

Improved group centers defined in Section 6.3. 

Mean redshift of the spec-z group members. 
° Fudge radius in physical Mpc (see Sect. 4.2). 
' Velocity dispersion for groups with N > 5 (see Sect. 4.2). 
s Fudge mass for the DM halo (see Sect. 4.2). 
^ Group purity parameter (GPR2) (see Sect. 3.3). 



Table 5 

Spec-2 and photo- 2: group galaxies (excerpt) 



Galaxy ID 


Group ID 


20k Flag'' 


GAP2'' 


a 
(deg) 




5 

(deg) 






log(M./M0)d 








819041 





1 


1 


149.99837 


2, 


.03514 


0, 


.0789 


9.33 


0.92 


0.00 


0.00 


818934 





1 


1 


150.02406 


1, 


.96865 


0, 


.0779 


8.30 


0.89 


0.00 


0.00 


818888 





1 


1 


150.03653 


2, 


.02487 


0, 


.0794 


7.85 


0.96 


0.00 


0.00 


819026 





1 


1 


150.00038 


1, 


.97859 


0, 


.0802 


10.05 


0.90 


0.00 


0.00 


818839 





1 


1 


150.04871 


2, 


.07792 


0, 


.0775 


8.17 


0.82 


0.00 


0.00 


819133 





1 


1 


149.96812 


2, 


.06726 


0, 


.0779 


8.17 


0.80 


0.00 


0.00 


819032 





1 





149.99948 


1, 


.98699 


0, 


.0805 


10.22 


0.92 


0.00 


0.00 


819060 





1 


1 


149.99123 


1, 


.99116 


0, 


.0797 


8.11 


0.91 


0.00 


0.00 


818935 





1 


1 


150.02393 


2, 


.07273 


0, 


.0779 


7.97 


0.85 


0.00 


0.00 


818815 





1 


1 


150.05394 


2, 


.03343 


0, 


.0785 


8.47 


0.91 


0.00 


0.00 


819118 





1 


1 


149.97241 


2, 


.10540 


0, 


.0781 


7.99 


0.71 


0.00 


0.00 


819104 





1 


1 


149.97723 


2 


.00483 


0, 


.0779 


10.16 


0.89 


0.00 


0.00 


818982 





1 


1 


150.01341 


2 


.02956 


0, 


.0791 


10.70 


0.96 


0.16 


0.96 


818787 





1 


1 


150.06047 


2, 


.00672 


0, 


.0785 


10.48 


0.91 


0.01 


0.00 


700213 








-1 


150.07021 


1, 


.85821 


0, 


.1029 


8.19 


0.19 


0.00 


0.00 


700241 








-1 


149.98257 


1, 


.80462 


0, 


.0964 


7.87 


0.09 


0.00 


0.00 



Note. This table is available in its entirety in a machine-readable form in the online journal. A portion is shown here for 
guidance regarding its form and content. 



1 if spec-2 is available, otherwise 0. 
^ Galaxy purity parameter for spec-2 members (see Sect. 3.3), —1 for photo-2 members. 
Spec-2 if available, otherwise photo-2. 

Stellar mass (computed without considering mass return, see Sect. 2.2). 
" Association probability (see Sect. 5.1). 
^ Probability to be the most massive (see Sect. 6.2). 
s See Sect. 6.4. 



Table 6 

Basic statistics for the zCOSMOS 10k and 20k group catalogs 



10k 20k 





iVgi- 




(GRPi> 


N ^ 

iVgr 




(GRP: 


TV = 2 


514 


0.79 


0.79 


932 


0.89 


0.89 


3 < Af < 4 


184 


0.81 


0.77 


374 


0.91 


0.89 


5 < < 9 


91 


0.95 


0.87 


151 


0.87 


0.81 


Af > 10 


11 


1.0 


0.93 


41 


0.90 


0.78 



" Number of groups. 
Fraction of groups in the corresponding GRPi, i € {1,2}, subcatalog. 
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N 



Figure 5. Number of groups TVgr as a function of observed ricli- 
ness N. The upper panel shows the absolute number of groups 
and the lower panel the relative number compared to the number 
of real groups within the mock catalogs. Shown are the 20k FOF 
catalog (red solid line), the 20k GRP2 catalog (red dashed line), 
and the mean of the 20k FOF mock catalogs (blue line). The error 
bars indicate the standard deviation among the 24 mock catalogs, 
and the gray shaded area corresponds to the sample variance for 
the real 20k mock groups. 

all. The percentage of reconstructed groups exhibiting a 
2WM to real groups is >75%. Of the remaining ~25%, 
the fraction of overmerged groups is higher than that 
of fragmented groups. It should also be noted that for 
groups with N > 5 there arc almost no spurious groups. 
That is, essentially every group that is found constitutes 
a real physical structure in the universe, but in 20%- 
30% of cases, the group-finder has made it significantly 
too small or too big (by a factor of more than two in 
membership) compared to the real group. The fact that 
Figure 6 is basically independent of iV is a consequence 
of the application of the multi-run scheme (see Sect. 3.2). 

Since the FOF groups depend solely on the two quan- 
tities Zpor and ^par which arc the linking-lcngths perpen- 
dicular and parallel to the line of sight, respectively, a 
natural question is whether a given group is sensitive to 
the particular choice of these linking lengths, or whether 
slightly different values would not significantly alter the 
resulting group? To answer this question we have in- 
troduced a "group robustness" parameter, /rob(/) for 
each group, by running the groupfinder with the linking- 
lengths / • Zpcr and / • ^par, parameterized by the scale 
factor /, and computing for each group the fraction 

/robUJ - j 7V(/)/iV, if/>l, 
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Figure 6. Portions of the different possible associations between 
reconstructed and real groups for the 20k mock catalogs as a func- 
tion of observed richness A'^. The bottom layer shows the fraction 
of reconstructed groups having 2WM to real groups. The surface 
of this layer is equal to p2{N). The Dark gray layer shows the 
portion of IWM (which are no 2WM) from reconstructed groups 
to real groups ( "fragmentation" ) and the light gray layer the corre- 
sponding portion from real groups to reconstructed groups ("over- 
merging"). The white area corresponds to the portion for which 
no association exists ("spurious groups"). The dashed line is a 
benchmark at 0.7. 

where N{f) is the new richness of that group. This as- 
sures that ./rob(/) takes only values between and 1 
and that the robustness increases for higher /rob(/), with 
/rob(/) = 1 being a highly robust structure. /rob(/) is 
a measure of how sensitive the implied membership is to 
changes in the linking length. For / < 1 it probes the 
robustness in respect to fragmentation and for / > 1 in 
respect to overmerging. 

Figure 8 shows the results for / = 0.5 and / = 2 for 
20k FOF mock groups in the richness classes 5 < iV < 9 
(red lines) and TV > 10 (black line). These results are 
not sensitive to the precise value of /. The upper pan- 
els exhibit the p2 statistics for the corresponding /rob(/) 
selected subsamples. Reducing the linking length tends 
to have a bigger effect than increasing it. Roughly 50% 
of groups in the mock catalogs lose a half of their mem- 
bers when the linking lengths are halved, but only 25% 
of groups double their memberships when the linking 
lengths are doubled. As would be expected, the overall 
purity increases strongly as the robustness /rob(/) ap- 
proaches unity, for both / being larger and smaller than 
one, and groups whose membership is stable to changes 
in the linking lengths are, not surprisingly, likely to be 
the purest. However, the lower panels make clear that 
raising the purity of subsamples significantly by apply- 
ing cuts in the robustness comes at the expense of losing 
many groups. 

For the actual 20k group catalog the group fraction 
is shown by the dashed lines in Figure 8. Particularly 
the big groups are significantly less robust in respect to 
fragmentation than the corresponding mock groups. We 
do not know the reason for this but it matches other 
properties of the 20k group catalog such as the lack of 
high richness groups (see Sect. 4.3). Note that in contrast 
to the completeness and purity, the group robustness is 
one of the few quantities that can be computed using the 
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Figure 7. Positions of the zCOSMOS 20k groups in rcdshift space. The groups are plotted as a function of right ascension and comoving 
distance, where the richness TV of the groups is color coded as indicated above the cone. The labels on the left side of the cone indicate the 
redshift and the ones on the right side the corresponding comoving distance. Note that the transverse scale of the cone has been stretched 
by about a factor of two for clarity. In reality, the comoving depth of this cone (from z = 0.1 to 1) is about 70 times longer than its 
transverse comoving size at 2 = 0.5. The comoving transverse scale of the cone is indicated by the horizontal bar at the top. The clustering 
of the groups and the cosmic large-scale structure are clearly visible up to the highest redshifts. 
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Figure 8. Purity and group fraction as a function of group robust- 
ness. The solid lines show the median and the error bars the upper 
and lower quartiles of the 24 20k mock catalogs. The dashed lines 
are the results for the actual 20k group catalog. Red lines cor- 
respond to the richness class 5 < A'^ < 9 and the black lines to 
N > 10. 

actual data without the need for mock catalogs and thus 
allows a direct comparison with simulated data. 

4.2. Estimates of physical properties 

As pointed out in K09, we are able to estimate the ve- 
locity dispersion CTv for groups with N > 5 and CTv ^ 350 
km s~^ to an accuracy of about 25%. On the other hand, 
a reliable estimation of dynamical mass by means of the 
virial theorem has proved to be very difficult, not only 
because of the error of the velocity dispersion enters the 
virial theorem quadratically, but also because reliable es- 
timates of the virial radius are very hard to obtain. Using 
the mock catalogs we found that the projected apparent 
extension of a group hardly correlates at all with the 
virial radius of the corresponding DM halo. The un- 
availability of reliable dynamical mass estimates is one 
major shortcoming of our group catalog, and others con- 
structed in similar ways. To have at least an idea of 
the typical mass of the groups we introduced in K09 the 
so-called fudge mass by taking the corrected richness N 
of the group (i.e. observed richness N corrected for SSR 
and RSR) at a given redshift z as a proxy for its mass. 

In the same spirit we can define "fudge quantities" 
Qfudge for any quantity Q that at a given redshift ex- 
hibits a correlation to the corrected richness N or to 
another quantity Q which is independently measurable 
(e.g. velocity dispersion, projected extension). That is, 

a group at redshift z with corrected richness N and with 
the measured property Q can be assigned a correspond- 
ing Qfudgo defined by 

Qfudgc = (^QmockiN,Z,Q)'j , (5) 

where the brackets () denote the average considering all 
reconstructed groups with 2WM to real groups for which 
the corresponding measured quantities are within some 



Table 7 

Errors for the fudge quantities using the 20k mock catalogs 



Quantity Error N = 2 3< N <A 5<N<9 Af>10 

Affudgo A dex 0.37 0.27 0.18 0.15 

ffudgo Rel- error 21% 19% 13% 9% 

r-fudge Rel. error 27% 23% 16% 11% 



range ofN^z and Q, and Qmock denotes the correct group 
property of the corresponding real group. 

Additionally to the fudge mass we have computed 
fudge estimates for the halo virial velocity ("fudge ve- 
locity" ) and halo radius ( "fudge radius" ) . For the fudge 
velocity we have used the apparent velocity dispersion 
as Q and for the fudge radius, wc use the apparent pro- 
jected size of the group, as defined below. The scatter of 
the estimated quantities compared to the true quantities 
for reconstructed 20k mock groups exhibiting a 2WM 
to real groups is given in Table 7. As expected the er- 
rors decrease with increasing observed richness N. Note 
that the fudge quantities must not be mistaken for real 
physical estimates of the corresponding quantity, they 
are rather "typical" values calibrated using the mock cat- 
alogs. 

4.3. Number of groups as a function of N 

The most straightforward way to compare the actual 
data with the mock data is by means of the number of 
reconstructed groups as a function of observed richness 
TV. This is shown in Figure 5. Compared to the mock 
data the number of groups of the 20k group catalog is 
mostly within the range expected due to sample variance 
within the 24 mock catalogs. Since for the 20k mock 
catalogs the number of reconstructed groups traces very 
well the number of real groups for any richness, there is 
no need to distinguish between them. 

The overall slope of the Ng^{N) function for the actual 
data, however, is steeper than for the mock data. Partic- 
ularly the number of groups with two and three members 
is about 25%-50% higher than in the mock catalogs and 
for > 18 there is a significant lack of groups in the 20k 
sample compared with the mock catalogs. Both trends 
were already noted for the 10k sample in K09 and are 
now confirmed with the larger 20k sample. The excess 
of groups with N < 3 cannot be blamed to the exis- 
tence of secondary objects (serendipitous observations in 
the spectroscopic slits) which could boost the number of 
small groups since the fraction of such objects is only 
about 2%. Interestingly, a significant lack of high rich- 
ness groups relative to the Millennium simulation has 
recently also been reported for the large GAMA FOF 
group catalog at local redshift (Robotham et al. 2011), 
which indicates that this lack is not a peculiarity of the 
COSMOS field. 

It should be particularly noted that many of the in- 
dividual mock catalogs contain groups which are much 
larger than those in zCOSMOS. While the largest group 
in the 20k sample has 33 members, there are on average 
about 3-4 groups with A^ > 40 per 20k mock catalog and 
1-2 groups with A^ > 60. These huge groups are not an 
artifact of our group-finding algorithm, but are present 
as real groups in the mock catalog. Since the high-mass 
end of the halo mass function is very sensitive to the 
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amplitude of the matter power spectrum in the universe, 
fTg , the large number of big groups in the mock catalogs 
could reflect the fact the erg = 0.9 for the Millennium 
simulation is too large relative to recent measurements 
of (78 ~ 0.82 ± 0.02 (e.g. Komatsu et al. 2011). 

A direct measurement of erg by means of the group 
mass function is, however, very difficult, for two main 
reasons. First, it is the high-mass end of the mass func- 
tion that is most sensitive to erg and where our catalog is 
most complete (see Fig. 2). Due to the relatively small 
volume of zCOSMOS we are in the regime of low num- 
ber statistics for such high masses and thus arc affected 
by cosmic variance, particularly at low redshift. Second, 
we checked that a mass cut by means of the fudge mass 
would introduce some mass-dependent systematics into 
the mass function estimation so that a robust estimation 
of (Tg would require improved mass estimates. However, 
the fact that there is no group in the 20k group catalog 
with Mfudgo > 2- 10^* Mq, while there are ~3.5 on aver- 
age in each mock, certainly favors a low ug. Only three 
out of 24 mock catalogs (i.e. 12.5%) contain no group 
with that high fudge mass. 

At this point it is interesting to come back to the find- 
ings on the group robustness in Section 4.1. We noted 
that for big groups the group robustness in respect to 
fragmentation is significantly lower than for the corre- 
sponding mock groups (Fig. 8, black dashed lines). This 
points in the same direction as the detected lack of big 
groups. There are not only fewer big groups in the zCOS- 
MOS group catalog than in the mock catalogs, the ob- 
served groups are also less robust. 

4.4. Fraction of galaxies in groups 

A quantity closely related to the number of groups in 
a catalog is the fraction of galaxies that are in groups. 
Since the number of groups traces roughly the number 
of galaxies in zCOSMOS (cf. Fig. 12 of K09), comput- 
ing fractions of galaxies in groups instead of the abso- 
lute number of groups diminishes the effect of large-scale 
structure and associated cosmic variance. A measure- 
ment of this fraction allows further comparison with the 
mock catalogs and allows us to trace the buildup of the 
cosmic group environment over time. The analysis in this 
section will be entirely restricted to the central region of 
the zCOSMOS survey (see Fig. 1). 

The fraction of galaxies in groups for the full flux- 
limited 20k group catalog and 20k galaxy sample is 
shown in Figure 9 as a function of redshift for iV > 2 and 
N > 5. The overall behavior of the fraction of galaxies in 
20k groups (red line) matches quite well those of the re- 
constructed (or real) mock groups, at least in the redshift 
range z < 0.6. At the highest redshifts, the fraction of 
group galaxies in zCOSMOS is significantly higher than 
in the mock catalogs. The reason for this is unclear. 
It may indicate a problem of the semi-analytic models 
to follow the evolution of galaxies. Most of these high- 
est redshift groups are only detected as pairs, leading to 
possible worries about the sampling of objects. However, 
the red dashed line corresponds to groups which arc still 
detectable even if all secondary objects were discarded, 
so this is not the cause of this effect. Furthermore, it 
should also be noted that the excess is also visible for 
much richer systems (lower panel in Fig. 9). It is notice- 
able (particularly for the lower panel) that the fraction of 
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Figure 9. Fraction of galaxies in groups as a function of redshift 
for the whole fiux-limited galaxy and group samples. The samples 
are restricted to the central region and show groups with N > 2 
in the upper panel and TV > 5 in the lower panel. The red solid 
line shows the fraction of galaxies in zCOSMOS 20k groups and 
the red dashed line the corresponding fraction if only groups are 
considered which are detectable without the existence of secondary 
objects. The black line shows the mean fraction of galaxies in real 
20k mock groups and the green line the mean fraction of galaxies in 
reconstructed 20k mock groups. The error bars indicate the stan- 
dard deviation among the 24 mock catalogs. The mock catalogs 
are in fair agreement with the actual data for z < 0.6, but contain 
significantly too few groups for z > 0.6. 

galaxies in groups is enhanced at the redshifts z 0.35 
and z ^ 0.70, where there are very large scale structures 
in the COSMOS field (cf. Fig. 1 of K09 and Fig. 7 in this 
paper). At low redshift the total fraction of galaxies in 
groups is about 40%, which is consistent with the results 
from the low-rcdshift GAMA group catalog (Robotham 
et al. 2011), despite the different limiting fluxes of the 
survey, presumably reflecting the weak dependence of 
satellite fraction on galaxy mass. 

In order to get a clearer view into the buildup of the 
group environment over cosmic time, it is better to work 
with volume-limited samples of galaxies and groups, or as 
close approximations to such as can be constructed. We 
can approximate a volume-limited sample of galaxies by 
applying a cut in absolute magnitudes, chosen to evolve 
with redshift to deal, at least roughly, with the individual 
luminosity evolution of galaxies. We will apply the cut 
as 

Mb < MB.iim - z (6) 
for different absolute magnitude limits A/B,iim- We per- 
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Figure 10. Completeness of mock groups containing at least two 
members brighter than Mb lim ~ ^- The lines correspond to the 
mean of the 24 mock catalogs for different absolute magnitude 
Umits (blue: MB,iim = -19.75; red: A/B,iim = -20.25; green: 
A^B,lim = —20.75) and the error bars to the standard deviation of 
the mean. It is obvious that for the redshift range 0.1 < z < 0.8 
the completeness for all three magnitude limits is fairly constant. 

formed the analysis with three magnitude hmits MBjim 
being -19.75, -20.25 and -20.75, respectively. The 're- 
sulting galaxy populations are complete at least up to 
z - 0.8. 

To construct a volume-limited sample of groups we se- 
lect all groups with at least two members brighter than 
^^B.iim — z. We use the observed richness rather than the 
richness corrected for SSR and RSR to avoid the scatter 
that is introduced by potentially large completeness cor- 
rections. This procedure is not perfect. For instance, 
two galaxies may be linked at low redshift by others be- 
low the absolute magnitude cut, to form a "group" that 
would be undetected at high redshifts where the absolute 
magnitude limit is closer to the flux limit of the spectro- 
scopic survey. This could lead to a rcdshift-dcpendcnt 
ci and/or p2. However, Figure 4 shows that the redshift 
dependence of ci and pi is negligible over the redshift 
range considered here. 

To address these and other concerns, Figure 10 shows 
the number of reconstructed mock groups compared with 
the number of all groups in the mock catalogs that host 
at least two bright galaxies, irrespective of whether these 
groups are detectable within the 20k mock samples, or 
not. The obtained completeness is therefore lower than 
that shown in Figure 3, where only the "detectable" 
groups were considered as the parent sample. The com- 
pleteness computed in this way is found to be fairly con- 
stant in the redshift range 0.1 < z < 0.8 for all three 
absolute magnitude cuts. This reassures that there arc 
no strong systematic biases for the absolute magnitude 
selected groups as a function of redshift. 

Having established that our "volume-limited" samples 
should be free of bias, the fraction of galaxies in the 
groups is shown in Figure 11. For the 20k sample, we 
again see the signatures of the big structures at redshifts 
z ~ 0.35 and z ^ 0.70, as in Figure 9. This could indicate 
that the luminosity function of galaxies in groups is possi- 
bly environment dependent. Nevertheless, there is a clear 
overall trend for the fraction of (volume-limited) galax- 



Figure 11. Fraction of galaxies in groups for volume-limited 
galaxy and group samples. The blue, red and green solid lines show 
the results for the zCOSMOS 20k data (blue: MB,iim = -19.75; 
red: MB,iim = -20.25; green: MB,iim = -20.75). The error bars 
for the actual data are obtained by bootstrapping and the dashed 
lines exhibit linear fits to the data points. The gray lines are the 
corresponding mean curves of the 24 FOF mock catalogs, where the 
luminosity increases for the lower curves. The error bars exhibit 
the standard deviation of the mean. This figure demonstrates the 
buildup of the cosmic group environment over the redshift range 
0.2 < 2 < 0.8. 

ies in (volume-limited) groups to significantly increase 
with decreasing redshift, as indicated by the dashed lines. 
This demonstrates the buildup of the cosmic group envi- 
ronment over a large fraction of the last 7 billion years. 
It should be noted that this result is insensitive of the 
precise form of the redshift correction in Equation (6). 

Curiously, the observed fraction of (bright) galaxies 
in the 20k groups is significantly higher than in the 
mock groups. This finding is independent whether the 
flux limit for the mock catalogs is adjusted or not (see 
Sect. 2.3). The fraction in the mock catalogs, however, 
approaches that of the 20k sample as we go to fainter 
galaxies at the flux limit (in agreement with Figure 9). 
This suggests that the cause of the discrepancy on Fig- 
ure 11 could be a problem with the magnitudes of bright 
galaxies in the COSMOS mock light cones. 

5. PHOTOMETRIC GROUP MEMBERS 

For some applications it is very useful to have a com- 
plete galaxy sample down to a magnitude limit. For ex- 
ample, for studying the most massive galaxies in groups 
it must be ensured that these galaxies are present in the 
sample. Since even the 20k sample is only complete to 
about 55%, the spectroscopic group catalog is not yet op- 
timized for this kind of studies. On the other hand, since 
zCOSMOS is performed on the COSMOS field which 
was followed in many wavelength bands, we would like 
to use all the available data to improve the group cat- 
alog which include high-quality photo-z catalogs for all 
galaxies in the COSMOS field down to Iab = 22.5. In 
this section we present our method of populating the 
spectroscopic groups discussed in the previous chapter 
by photo- 2; galaxies on a probabilistic basis. 

Although there are in principal ways to detect groups 
in photometric galaxy samples (e.g. Li & Yee 2008; Gillis 
& Hudson 2011), we will only use the groups detected by 
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spectroscopic galaxies. We will not use photo- z galaxies 
to detect new groups. Thus our resulting group sample 
will be missing the population of all groups in the sky 
that do not have more than one spectroscopic member. 
Inspection of Figure 2 gives information on the fraction 
of groups that are missed for this reason since it plots the 
fraction of detectable halos (i.e. those with two or more 
galaxies above the zCOSMOS flux limit) that actually 
had two or more galaxies observed spectroscopically af- 
ter the incomplete spatial sampling and redshift success 
rates are applied. 

5.1. Assigning probabilities to photo-z galaxies 

Although the photo-z errors of ~0.01(l-f z) are impres- 
sively small by normal standards, we cannot incorporate 
these galaxies into the group-finding scheme directly, or 
even unambiguously assign them to groups in a unique 
and reliable way. Some group galaxies might appear at 
large distance from the group center in redshift space 
and some galaxies could be candidates for several groups. 
However, we can attempt to quantify the probability that 
galaxies are associated to a given group. This probabil- 
ity will depend on the distance from the group center 
both in the plane of the sky and in the redshift dimen- 
sion. We can again use the mock catalogs to determine 
these probabilities, similar to their use to fine-tune the 
group-finding algorithm. Additionally, the association 
probability may also depend on the luminosity or stel- 
lar mass of the galaxy in question. However, since this 
may depend on the galaxy evolution prescription in the 
COSMOS mock light cones and since one of our scientific 
goals is to use the group catalog to test such relations, we 
decided not use this additional information in estimating 
association probabilities. 

Suppose we have a group at (agr, (5gr, Zgi) in redshift 
space and a nearby galaxy at (a, 5, z) with a redshift 
error of 5z- We will parameterize the distance of the 
galaxy from the group by the scaled, dimensionless offsets 
perpendicular and parallel to the line of sight 



|z-Zgr| 



(7) 



where r(Q!, J, agr, ^gr, Zgr) is the physical distance of the 
galaxy from the group center perpendicular to the line 
of sight and rgr is a measure of the projected physical 
extension of the group. A suitable group extension pa- 
rameter Tgr should ideally scale with the virial radius of 
the group and (agr,<5gr) should approach the center of 
the underlying DM halo. Since there are no unique es- 
timators satisfying these requirements we will focus on 
different possibilities and discuss their relative strengths 
using the mock catalogs. 

Regarding the group extension rgr, a natural estimator 
would be the root-mean-square (rms) extension of the 
spectroscopic members within the group, that is. 



with 



^rms(Q^gr; ^gr) 



J(^gr) 
(1 + Zgr) 



^rms (*^gr ; ^gr ) i 
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(8) 



(9) 



and Afti = Ui — ckgr and A(5i = 5i — 5gr, where (ai,(5i) 
is the position of the ith galaxy in the group and D{zg^) 
is the comoving distance to redshift Zgr. Note that this 
estimator is still dependent on the choice of the group 
centers (agr, (Jgr). The main drawback of this choice is 
its low correlation with the virial radius of the group in 
the mock catalogs. In fact, it proved to be very hard to 
estimate the virial radius from the distribution of galax- 
ies. The second problem is based on the observation that 
particularly for groups with low richness N the scaling 
ri-ms can become unrealistically small because of chance 
orientation effects. Another approach for r^^- is the fudge 
radius rfudge, which has the advantage of solving both 
the drawbacks of rrms- 

The estimators for the group centers are discussed in 
detail in Section 6.3. Some of the discussed estimators 
use also the photo-z information. As a benchmark for 
comparison we will often use simply the average over the 
positions of the spectroscopic group members which will 
be termed "standard centers" . On the other hand, for the 
final computation of association probabilities, we have 
used "improved centers" (defined in Sect. 6.3) which are 
themselves based on association probabilities of photo- 
z galaxies. So the final probabilities are obtained by 
an iterative procedure which, however, already converges 
after one iteration. 

Taking all reconstructed mock groups with 2WM to 
real groups, we then compute the fraction f(ar,az,N) of 
photo-z galaxies which are members of the corresponding 
real group as a function of tXr, dz, and TV. To obtain 
large enough group samples for the computation of /, 
we restrict the richness dependence to just four richness 
classes = 2, 3 < < 4, 5 < iV < 9, and > 10. For 
each galaxy and each group, the function /((Tr, Cz, N) is 
then evaluated and interpreted as the probability that 
this galaxy is a member of this group. Since the function 
/ was estimated using only the reconstructed groups with 
2WM to real groups, it does not include the effects of 
deficiencies in the original detection of the spectroscopic 
groups (cf. Fig. 3). In other words, / is the probability 
that a galaxy is a member of an apparent group, defined 
as a certain location in (a, S, z) space, to which should 
be multiplied the probability that the apparent group is 
actually real. 

The functions f{aY,a-z,N) are shown in Figure 12 for 
the four richness classes. They are all very similar and 
are smooth, strongly decreasing functions for increasing 
(Tr and CTz. Not surprisingly the probability of a galaxy 
being a member of the group is usually much larger than 
the formal integral of the redshift photo-z probability 
distribution for that galaxy over the very small redshift 
interval associated with the group, which is of course the 
motivation for this approach. 

In this scheme a galaxy can be associated to more than 
one group if it lies close enough to both of them, i.e. if 
/((Tr, (Tz, A^) is non-zero in either case. Indeed, the proba- 
bilities as computed in the previous paragraph may even 
sum up to more than unity. We therefore introduce a 
slight modification of the assigned probabilities. If a 
galaxy is associated to n groups with probabilities pi, 
i = 1, . . . , n, we first compute the probability that it is 
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Figure 12. Fraction f {ai,cr^, N) of photo-2 galaxies to be asso- 
ciated to groups, wliere Cr is based on the fudge radius and the 
improved centers. The surface is the mean of / of the 24 mock 
catalogs and the error of the mean is indicated by the black bars. 
The function / was empirically computed using the reconstructed 
groups exhibiting 2WM to real groups (i.e. it does not include the 
effect of group detection failures) and is based on rfu^g^ and the 
improved centers (see the text). For ^ 3 or CTz ~ 3 the fraction 
/ is basically zero. 



not a member of any group 



Pnongr = J^^^ ~^'*) 



(10) 



Then the probabihty of the galaxy to be in any group is 
taken to be 1 — Pnongr instead of ptot = Ylll=iPi- Finally 
we just scale the probabilities by the ratio of these two, 
i.e. 

p,=p /-^"°"g- . (11) 

Ptot 

For the ease of notation we will just write pi instead 
of Pi in the following and refer to these quantities as 
"association probabilities" . 

5.2. Properties of the association probabilities 

In the following, we will study the properties of the as- 
sociation probabilities introduced in the previous section 
in terms of fidelity and completeness for different group 
subsamples and different choices of the group extension 
Tgr and group centers (agr,(5gr), and we will compare the 
distribution of probabilities in the mock catalogs to that 
in the actual data. 

To investigate the fidelity of the association probabil- 
ity, we define a photo-z to be "successfully associated" to 
a reconstructed mock group with a 2WM to a real group 
("2WM group"), if the photo-z galaxy is a member of 
the real group. For reconstructed mock groups with no 
2WM to real groups ("non-2WM group"), i.e. the -30% 
reconstructed groups which are not in the bottom layer 
of Figure 6, the definition of a successful association is 
more subtle. If a non-2WM group is fragmented, a suc- 
cessful association is defined in the sense that the photo-z 
is a member of the real group to which our reconstructed 
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Figure 13. Fraction of correct associations as a function associa- 
tion probability p for different richness classes. The lines show the 
mean of the fractions computed for each of the 24 mock catalogs 
and the error bars indicate the standard deviation of the mean. 
The solid lines correspond to estimates of p based on rfmjgo and 
the improved centers, and the dashed lines correspond to estimates 
based on rmis and the standard centers. The red lines correspond 
to galaxies associated to reconstructed groups exhibiting a 2WM 
to real groups. Ideally this lines should lie on the dotted line. Sta- 
tistically significant deviations are caused by galaxies which are 
associated to more than one group. The probabilities for such 
galaxies are given by the green lines. In contrast, the blue lines 
are for galaxies associated to group which do not exhibit a 2WM 
to real groups. 

group is associated. In the case of an overmerged recon- 
structed group the photo-z, there is more than one real 
group that is associated to our reconstructed group. Here 
a photo-z is successfully associated to the reconstructed 
group, if it is a member of the corresponding real group 
that contains the largest fraction of the members of our 
reconstructed group. For spurious groups, there is no 
corresponding real group and every photo-z is regarded 
as failed. 

Figure 13 shows the fraction of successful associations 
as a function of probability p. The red line shows the suc- 
cess of associations for 2WM groups and this should be 
a diagonal line because the probabilities were calibrated 
using these groups. The green line shows the result for 
those galaxies in 2WM groups which have non-zero as- 
sociation probabilities to more than one group and also 
looks satisfactory. The net result for non-2WM groups 
is shown in blue. These curves are lower than the other 
curves because of the problems with group identification. 
The solid lines correspond to estimates of p based on the 
fudge radius and the improved centers, and the dashed 
lines correspond to estimates based on rrms and the stan- 
dard centers. While the choice of the group extension rgj. 
seems to have a negligible effect for those photo-z being 
associated to 2WM groups (red versus green lines), the 
fudge radius rtudgc seems to work better for the "failed 
groups". The reason for this is that such groups have 
sometimes strange shapes so that rrms is far too large 
which results in more (wrongly) associated galaxies than 
if ''fudge was used. The fudge radius rfudgc instead de- 
pends only on the richness and thus is unaffected by the 
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Figure 14. Completeness of the photo-z group population with 
probabilities >p. The lines show the median of the 24 mock cata- 
logs and the error bars exhibit the upper and lower quartiles. Only 
mock groups with 2WM to real groups are considered. The blue 
line corresponds to probabilities p based on rrms, the green line to 
p based on /fudge and standard group centers, and the red line to 
p based on /fudge and on the improved centers. 



shape of the group. 

The completeness of the group membership for all 
photo-z galaxies above a given threshold in p is shown in 
Figure 14. The blue line is for probabilities p based on 
Trms, the green line for p based on rfudge and the stan- 
dard centers, and the red line for p based on rfudgo and 
the improved centers. The biggest difference between 
the blue curve (using rrms) and the other lines is at low 
p, where particularly for small groups the completeness 
is significantly lower than for the curves being based on 
"'"fudge- For small groups, r^rns can be an underestimate 
and so too few photo-z galaxies are associated to such 
groups. This is the most significant advantage of using 
'''fudge instead of rj-ms- The difference between the choice 
of the group centers is most obvious at high p and for 
large groups, where the improved centers exhibit a slight 
improvement. The choice of the group extension is, how- 
ever, more important than the choice of the centers. 

The fraction of photo-z with an association probability 
>p is shown in Figure 15 for the actual data and for the 
mock catalogs. To allow for a meaningful comparison be- 
tween galaxies with p > and galaxies with p = we con- 
strain the redshift range to 0.1 < z < 0.8, where most of 
the groups are. About 60% of the photo-z galaxies have 
zero probability to be associated with any of the spectro- 
scopic groups, while 40% have a non-zero probability of 
membership of one or more groups. This fraction of pos- 
sible group members drops quite fast as the p threshold is 
increased. The slight excess of low-probability members 
in the actual data is due to the larger number of small 
groups in the 20k sample (cf. Fig. 5). 

The completeness and interloper fraction for the flux- 
limited mock group population that is obtained by in- 
cluding in the groups all potential members with a min- 
imal association probability p are summarized in Figure 
16. We show the mean completeness Sp (blue region) 




Figure 15. Fraction of photo-z galaxies with an association prob- 
ability > p in the redshift range 0.1 < z < 0.8. The histogram 
shows the actual data and the solid line the corresponding fraction 
within the mock catalogs (error bars are the standard deviation 
among the 24 mock catalogs). The fraction of galaxies which are 
not associated to any spectroscopic group (i.e. those with p = 0) 
are shown on the left. These comprise 60% of the photo-z objects. 
It should be noted that the p = and p > fractions do not 
sum to unity because some galaxies have multiple p values due to 
possible membership to different groups. The slight excess of low- 
probability members in the actual data is due to the larger number 
of small groups in the 20k sample. 



and mean interloper fraction Ip (red region) of the 24 
mock catalogs, where in each mock catalog all recon- 
structed groups were considered (left panel). We regard 
only those group members as successes that are members 
of the corresponding real group. The point p = 1 cor- 
responds to the purely spectroscopic group membership. 
Note that these statistics are worse than the galaxy suc- 
cess rate S'gai and interloper fraction fi shown in Figure 3 
because previously we were only concerned with whether 
the galaxy was a member of any group. Furthermore, 
here we refer to the entire flux-limited population and 
not only to the spectroscopic sample. 

The interpretation of Figure 16 is as follows: look- 
ing at the claimed membership of a given reconstructed 
mock group, i.e. summing the spectroscopic members 
and all those photo-z galaxies above a minimal proba- 
bility threshold p, the new galaxy success rate Sp is the 
number of these that are actually members of the cor- 
responding real group divided by the total membership 
of this corresponding real group. This is given by the 
blue region of the left-hand panel which is bounded by 
the lines for = 2 and N > 10, where N refers to the 
observed spectroscopic richness. The fraction of claimed 
galaxies that are not members of this particular group, 
which is the interloper fraction Ip among the claimed 
members, is given by the corresponding red region. As 
an illustration, group members of an overmerged recon- 
structed mock group that belong to the second real group 
(that is not regarded as the proper real counterpart) are 
regarded as failures and will increase the Ip statistics 
(while they were not necessarily regarded as failures in 
the earlier fi statistics). If we, however, know (for rea- 
sons beyond our group catalog) that the group we are 
interested in is properly detected (i.e. has a 2WM to a 
real group), the statistics would improve to the regions 
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Figure 16. Average statistics for the total flux-limited galaxy 
sample. The left panel considers all groups in the mock catalogs 
and the right panel only those groups with a 2WM to real groups. 
The blue region shows the galaxy success rate 5p , the red region the 
interloper fraction /p, and the green region the fraction of correctly 
assigned most massive galaxies SpM by picking the galaxy with 
the highest pyi, where in each case the galaxy sample includes all 
galaxies with an assignment probability > p. The point p = 1 
corresponds to the spectroscopic sample. The solid lines indicates 
the statistics for groups with richness Af > 10 and the dashed 
lines that for pairs. Note that a galaxy membership is here only 
regarded as a success if it is a member of the corresponding specific 
real group making these statistics rather restrictive. The poorer 
performance in the left hand panel is due to the issues of group 
detection (i.e. overmerging etc.). 

in the right panel. Particularly for small groups (dashed 
lines), the difference will be significant owing to the un- 
certainties in the group detection. 

6. APPLICATIONS USING ADDED PHOTO-Z 
MEMBERS 

In this section, we perform four straightforward ap- 
plications considering the potential members on the ba- 
sis of their photo-z. We look in turn at the corrected 
richness, the identification of the most (stellar) massive 
galaxy in the group, the location of the spatial center of 
the group, defined as the minimum of the potential well, 
and finally an approach to identifying the galaxy at that 
center, which we define to be the central galaxy, all other 
group members being satellites. 

Motivated by the obvious variation of the galaxy suc- 
cess rate and interloper fraction of the spec-z group pop- 
ulation with group-centric distance (cf. Fig. 10 of K09), 
we introduced also an association probability p for the 
spectroscopic galaxies. This will prevent spec-z galaxies 
at the outskirts of the groups to be given a too large 
weight compared to their photo-z group population. We 
assigned the probabilities in the same way as for the 
photo-z except for the fact that we assign only probabil- 
ities to spectroscopic galaxies which were already group 
members and we set to zero, i.e. the association prob- 
ability was determined only by the distance from the 
group center. For pairs the assigned probabilities were 
just set to one. 

6.1. Corrected richness 

A straightforward application of the association prob- 
ability p is to estimate the corrected richness A'coit of 
the groups above the flux limit, i.e. the total richness 
the groups would have if we knew all their real members 



down to the flux limit of the survey, by summing up all 
probabilities of the group members (spec-z and photo- 
z). Not surprisingly, the estimated corrected richness 
is on average unbiased with respect to the real corrected 
richness for all observed spectroscopic richness classes iV, 
because this was used in establishing the probabilities. It 
exhibits a scatter of about 30%, weakly depending on N . 

The corrected richness could also be estimated by con- 
sidering the SSR and RSR (see Sect. 2.1) at the posi- 
tions of the spec-z group members. However, the re- 
sulting corrected richnesses are biased for groups with 
^ ^ 4 by being about 40% too high and also the scat- 
ter is larger being about 50%. The reason for this bias 
for small groups is a selection effect. Since the observed 
richness N is the result of a Poisson sampling process 
when assigning the slits to the targets, it has an intrinsic 
scatter for a given SSR and RSR. If, however, N drops 
below 2, the group cannot be observed and is lost, while 
for the scatter toward high N there is no such limit. 

We conclude that the photo-z are useful for obtain- 
ing unbiased estimates of the corrected richness for all 
groups. We can, of course, also estimate the corrected 
richness iVcorr(-^B,iim) with respect to a given absolute 
magnitude limit i\/B.iim (cf. Sect. 4.2 of K09). 

6.2. Identifying the most massive galaxy of the group 

We introduce the probability pM of a galaxy to be the 
most massive (in terms of stellar mass) of a given group. 
This is done by sorting all the members — spectroscopic 
as well as photometric — in descending order of mass such 
that Mi_i > Ml for z G 2, . . . ,iVtot, where TVtot is the 
number of spectroscopic and photometric members. The 
probability [pyi\i of a given galaxy is the probability that 
it is the most massive galaxy in the group, which will 
depend on both its own probability of membership, pi, 
and the probabilities of non- membership of higher ranked 
galaxies, i.e. for the flrst-ranked galaxy, [pm]i — Pi, and 
for the remainder, 

i-l 

[PMh--PtY[{l-Pj) , ie2,...,Ntof (12) 

Figure 17 compares the pm to the empirical fraction 
of correctly identified most massive galaxies within the 
mock catalogs. Ideally, this would be the dotted diago- 
nal line, in that galaxies with some value of pm should be 
the real most massive galaxies in a fraction pu of cases. 
The red curve uses association probabilities p based on 
f fudge and the blue curve those based on r^ais- The black 
curve is based on rfudgo, but does not include observa- 
tional errors in stellar mass determination, which are in- 
cluded at the level of 0.2 dex in the red and blue curves. 
The conclusion is that the basic scheme works (as would 
be expected) but that mass estimation uncertainties will 
be significant. While it makes no substantial difference 
whether the association probabilities p for the computa- 
tion of pm are based on rj-ms or rfudge, the uncertainty 
in the stellar mass of 0.2 dex causes the pm to be un- 
derestimated for large pm- Nevertheless there is a strong 
correlation between pM and the fraction of cases in which 
the galaxy under consideration is the real most massive 
galaxy of the group. For a cut, for instance, of pu > 0.7, 
the true probability is still higher than 50%, i.e. such a 
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Figure 17. Fraction of correct most massive galaxies within re- 
constructed 2WM mock groups as a function of pM ■ Each panel is 
for a different richness class as indicated. The lines show the mean 
of the fractions computed for each of the 24 mock catalogs and the 
error bars indicate the error of the mean. The red curve corre- 
sponds to pm based on rfu^gc and the sophisticated group centers, 
and the blue curve corresponds to pM based on rrms and the stan- 
dard centers. The black line corresponds to the former case, but 
does not include observational errors in stellar mass of 0.2 dex. 
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Figure 18. Distribution of the probabilities pM of those galaxies 
with the highest pm in their groups. These galaxies can be either 
spectroscopic or photometric. Each panel corresponds to a differ- 
ent richness class. The black histogram corresponds to the mean 
distribution of the 24 mock catalogs and the error bars indicate the 
standard deviation. The dotted histogram considers only the mock 
groups having a 2WM to real groups. The red histogram shows the 
histogram for the actual zCOSMOS 20k groups. It is obvious that 
most groups have a clearly identified candidate for being the most 
massive galaxy. 



galaxy has a bigger chance of being the most massive 
galaxy than all the other candidates in its group put to- 
gether. For a proper interpretation of pm it is, however, 
important to keep this effect of the mass uncertainty in 
mind. 

In assessing the usefulness of this scheme, this figure 
should be combined with Figure 18 which shows the dis- 
tribution of Pm as a function of richness class, for both 
the mock catalogs and for the actual 20k data. Note 
that for each richness class, the distribution of pm of 
those galaxies with the highest pM in their groups is a 
steep function of pM- This tells us that most groups 
have a clear candidate for being the most massive galaxy. 
The actual 20k sample (red histogram) follows fairly well 
the histogram for the mock catalogs (black solid) except 
maybe for the largest pm- Despite the uncertainty in 
stellar mass, pu is a very useful concept and works rea- 
sonably well for the actual data. 

It should be noted that in Figure 17 the pu for the 
unperturbed stellar masses slightly underestimates the 
true probability for pM ^ 0.5 as measured by the fraction 
of such galaxies that really arc the most massive members 
of their groups, i.e. the black line lies slightly above the 
dotted diagonal line. This is due to the fact that the 
association probabilities p were derived irrespective of 
the mass or luminosity of the galaxies. If massive galaxies 
are more likely to be in groups, then this process will 
have underestimated the pi and thus pm for these more 
massive galaxies, producing the small offset observed in 
Figure 17. 

The average success rate SpM of detecting the most 
massive galaxy within the reconstructed mock groups as 
a function of a probability p threshold is shown in Figure 
16. For high richness groups, this success rate increases 



by about 10%-20% when we include photo-2; galaxies, 
while it is relatively constant for spec-z pairs. So the 
inclusion of the photo-z galaxies has a rather small ef- 
fect on SpM- In fact, for 87% of all groups the galaxy 
with the largest pM has a spectroscopic redshift (for the 
mock catalogs this number is 84% ± 2%). It might be 
thought that this ratio should be equal to the average 
spatial completeness of the survey, since this determined 
the chance that a given galaxy is observed spectroscop- 
ically. This will be the case for very rich groups, which 
would be recognized regardless of the statistical fluctu- 
ations of spatial sampling. For poorer groups, there is 
however a selection effect in that those with higher spec- 
troscopic sampling will be more likely to be recognized 
as a group. Indeed, for "real" pairs, both members must 
have been observed spectroscopically for the group to be 
recognized, and the most massive galaxy will therefore 
always be a spectroscopic galaxy. 

Does this relatively modest gain mean that it is not 
worth bothering with the photo-z? The answer is no. 
First, we show in the next section that there are sig- 
nificant gains in finding the spatial center of the group. 
Second, the inclusion of photo-z objects dramatically re- 
duces the number of galaxies that are incorrectly iden- 
tified as the most massive in the richer groups. These 
may be among the most interesting from a galaxy evolu- 
tion point of view. It should also be noted that for these 
statistics the identification of the most massive galaxy is 
only regarded as a success if it is the most massive galaxy 
of the specific group we think it is a member of. Selecting 
a galaxy that is the real most massive galaxy of another 
group (even one that has been detected) is considered 
here as a failure. This is a rather restrictive perspective 
and depending on the application it may be sufficient to 
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just know whether a certain galaxy is the most massive 
of any group (cf. Sect. 6.4). 

6.3. Locating the spatial group center 

Another immediate apphcation of the association prob- 
abihties p is to estimate the centers of the groups. By 
group center we mean the center of the corresponding 
DM halo which is defined by the position of the deepest 
point in the gravitational potential well. In the Millen- 
nium simulations this position is given by the most bound 
particle within a halo and is also, by construction, the 
position of the "central galaxy" in the halo. 

With the aid of the mock catalogs we can test several 
estimators E for the group center and compare their rela- 
tive accuracy. Some of these estimators are based on the 
areas of the Voronoi cells of the projected group galaxy 
positions (see also Presotto et al. 2012). To compute 
these areas we project a group to the plane perpendic- 
ular to the line of sight and perform a two-dimensional 
Voronoi tessellation considering only the group galaxies 
(either only spec-z or both spcc-2; and photo-z) and the 
spectroscopic field galaxies surrounding the group to pre- 
vent the areas of the Voronoi cells at the outskirts of 
the group to become infinite. We expect the size of the 
Voronoi areas to be smaller on average toward the center 
of the groups. 

In the following, using the mock catalogs we will test 
10 different estimators, Ei to Eiq, to identify the group 
centers. The estimators Ei to E4 are depending on the 
spectroscopic information only: 

Ei: mean of the positions of the spectroscopic mem- 
bers; 

E2 : stellar mass weighted mean of the positions of the 
spectroscopic members; 

E3 : inverse Voronoi area weighted mean of the positions 
of the spectroscopic members; 

i?4: stellar mass and inverse projected Voronoi area 
weighted mean of the positions of the spectroscopic 
members. 

The estimators E^ to Eg include also the information 
from the photometric galaxies. They are basically iden- 
tical to the former estimators, but that each galaxy — 
spec-z as well as photo- 2; — is additionally weighted by 
their association probability p: 

E5: probability weighted mean of the positions of all 
group members; 

Eq: probability and stellar mass weighted mean of the 
positions of all group members; 

E-j: probability and inverse Voronoi area weighted 
mean of the positions of all group members; 

Eg: probability, stellar mass, and inverse Voronoi area 
weighted mean of the positions of all group mem- 
bers. 

The estimators Eg and Eiq are not any more based on the 
(weighted) mean of the positions of the group members, 
but attempt to find directly the central galaxies of the 
groups. They are defined by selecting for each group the 
galaxy with the largest ratio R, as follows: 



Table 8 

Borders dividing the four quartiles of the distribution 
of fiuiB for group populations of different richness 
classes 





1/2-quartilc 


2 / 3-quartilc 


3/4-quartile 


N = 2 


0.0011 


0.0034 


0.0055 


3 < Af < 4 


0.0042 


0.0064 


0.0089 


5 < Af < 9 


0.0079 


0.0109 


0.0150 


Af > 10 


0.0140 


0.0193 


0.0273 











Note. The values are given in degrees. 



Eg : location of the galaxy with the largest R = p/A; 

Eiq: location of the galaxy with the largest R = pM^,/A. 

Here, M* is the stellar mass and A the Voronoi area of 
the galaxies. Note that all estimators which are based 
on the Voronoi area A arc not necessarily defined for 
all groups since A might happen to be infinite for the 
members of small isolated groups near to the border of 
the survey. 

The average physical offsets between the estimated 
mock group centers and the real group centers are shown 
in Figure 19 for different richness classes and for differ- 
ent apparent group extensions frms (see Eq. (9)). The 
dependence on f^nis is considered by dividing each group 
population within a richness class into the four quartiles 
of its distribution in rrms- The values of the boundaries 
dividing the four quartiles are given in Table 8. The 
goodness of the different estimators Ei to Eiq strongly 
depends on both the extension and the richness of the 
group. There can be several statements made. 

• The smaller the group extension, that is, the 
smaller the group appears projected on the sky, 
the smaller the offset from the true group center. 

• The larger the observed richness of the group the 
more effective is weighting the galaxies by stellar 
mass or the inverse of the Voronoi area. 

• For group with N < 5 weighting by stellar mass is 
more effective than weighting by the inverse of the 
Voronoi area and reversed for N > 5. 

• Weighting by both stellar mass and the inverse of 
the Voronoi area is superior to weighting by ei- 
ther of these alone for all richness classes except 
for pairs. 

• The larger N and the more extended the group, the 
more effective is the consideration of the photo-z 
galaxies. For iV < 5 there is hardly any gain from 
using the photo-z information, whereas for N > 5 
there is a clear gain for all estimators particularly 
for extended groups. 

• The size of the error bars suggests that the most 
robust estimator for pairs is the simple geometrical 
mean between the two galaxies. 

• For all groups except of pairs, by far the best es- 
timator is EiQ. For groups with N > 5 for at 
least half of the groups of any extension the central 
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Figure 19. Projected physical offset between the estimated group centers and the true group centers in the mock catalogs. The lines 
show the median offsets of all reconstructed 2WM groups within the 24 mock catalogs, and the error bars indicate the upper and lower 
quartiles. The x'-axis plots the four quartiles for the apparent group extension frms (see Tab. 8). The estimators are indicated for each row 
in the left panel and the richness class increases toward right. Blue lines contain only spec-2 information, and red and green lines contain 
spec-2 and photo-2 information. For comparison, Ei is shown in all panels as dotted line. 



galaxy (not necessarily the most massive, see be- 
low) is correctly identified (i.e. offset equals zero) 
and for three quarter of the groups the offset from 
the true group center is less than about 20 kpc. 
Compared to the typical extension of a group of 
the order of half an Mpc, this is an extremely good 
result. 

However, regarding Eio we have to be careful since 
the mock catalogs by definition have a massive galaxy 
at the centers of their groups. This "central galaxy 
paradigm" is still under investigation (see e.g. Skibba 
et al. 2011). Also note that in the mock catalogs the 
central galaxy of a group is not always the most mas- 
sive, but in about 20%-25% of all real mock groups — 
depending weakly on N — there is a more massive galaxy 
within the magnitude-limited group population. 



Also note that 50%-60% of the galaxies selected by 
EiQ arc also the galaxies with the highest probability pM 
within the group. Although the two concepts are similar, 
they are not equal. In particular, pM can be assigned to 
each galaxy in a group and also yields a quantitative 
measure of fidelity rather then just selecting a particular 
galaxy without giving any information "how good" this 
selection is. Moreover pM is totally independent on the 
assumption where massive galaxies preferentially reside 
within a group. 

Given the results in Figure 19 we have chosen the 
"improved group centers", in contrast to the "standard 
group centers" being Ei, as follows. For groups with 
TV = 2 we kept the centers to be for 3 < TV < 4 
we took £"3 if available (93% of cases) and otherwise Ei , 
and for N > 5 we took Er (always available). So only 
for N > 5 we have used information from photo- 2; and in 
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neither case we have used information from steUar mass. 
Inspection of Figure 19 shows that the improved group 
centers should exhibit offsets < 100 kpc for basically all 
richness classes and group extensions. 

The effect on the group center estimates Ei-Eiq, if 
we use association probabilities p based on the improved 
centers instead of the standard centers, is strongest for 
E5, especially for rich and extended groups. The differ- 
ences in the offsets are, however, never larger than ~30% 
and for £'7 they are basically negligible. This shows that 
the iterative process for deriving the assocation proba- 
bilities indeed converges after one iteration. 

6.4. Separating central and satellite galaxies 

As mentioned in the previous section, the central 
galaxies, which we defined to be those galaxies located 
at the minimum of the gravitational potential, are not 
necessarily the most massive galaxies within the halos. 
However, in terms of evolutionary processes, it is likely 
that it is the location in the potential well that is most 
relevant, and so in the following we will discuss how well 
we can differentiate between the central galaxies and the 
remaining, so-called satellite galaxies. For both centrals 
and satellites we will differentiate between simply know- 
ing whether a galaxy is a central or a satellite, and the 
more stringent case of additionally knowing which group 
or halo it is the central or satellite of. 

Can we add spatial information to pm? Motivated by 
the performance of the group center estimator Eiq (see 
Fig. 19), we introduce another probability pma which 
is computed similarly to pu, but instead of ranking the 
group galaxies by their stellar mass M^, we rank them 
by M^/A with A being the area of the projected Voronoi 
cell of the galaxy and thus includes directly information 
on the local galaxy density. It should however be noted 
that pm already includes some positional information be- 
cause of the radial dependence of the group membership 
probability p. In the following we will discuss how the 
fraction of centrals /c and satellites /sat varies across dif- 
ferent galaxy samples that are selected in terms of p, pm-, 
Pma, and N.'^^ The results are summarized in Figures 
20 and 21 and Table 9. To allow for a sensible compar- 
ison between the number of central and satellites in the 
group and non-group galaxy population, we restrict the 
redshift range to 0.1 < z < 0.8. 

The fraction of galaxies /c which are the central galax- 
ies of their DM halo is shown in Figure 20 for different 
samples of galaxies from the mock catalogs. It should 
be noted that 72% of galaxies in the overall flux-limited 
galaxy sample, selected irrespective of any group mem- 
bership, are central galaxies (left panel). If those galaxies 
(with either spec-z or photo-2) which are associated to 
groups (i.e. which have p > 0) are excluded, then this 
fraction rises to 83% (second panel from the left). So, if 
a large sample of central galaxies is needed, irrespective 
of the halos in which they reside, then simply selecting 
the non-group galaxies will already produce a rather pure 
sample, albeit one biased to lower mass halos. 

Owing to the multiple group associations of some photo-2 
galaxies, a selection by p, pM and pMA does, in general, not lead to 
a sample of galaxies with unique group membership. If needed, we 
resolve this degeneracy by taking for each galaxy that has multiple 
associations to groups that association with the highest p. 



Table 9 

Fractions of centrals and satellites in different galaxy samples in 
the redshift range 0.1 < 2 < 0.8 



Sample Number^ /c /sat 

Any" Spccl^ Any'^ Spec.'^ 

AU 30231 0.72 - 0.28 
p = 19161 0.83 0.17 



Group galaxies selected by pM > 0.5 and pMA > 0.5 



N = 2 


453 


0.77 


0.50 


3 < Af < 4 


183 


0.80 


0.65 


5< N <9 


72 


0.78 


0.70 


N >10 


24 


0.79 


0.75 



Group galaxies selected by p > 0.5, pM < 0.1, and pma < 0.1 

N = 2 759 - - 0.65 0.61 

3 < Af < 4 794 - - 0.74 0.70 

5 < Af < 9 976 - - 0.79 0.73 

Af > 10 956 - - 0.84 0.76 



Number of galaxies in the corresponding actual data samples. 
For the mock catalogs see Figs. 20 and 21. 

^ Statistics for centrals and satellites irrespective of their specific 
group memberships. 

Statistics for centrals and satellites for residing in the specific 
groups we think they are members of. 

However, if we want a sample of centrals extending up 
into the range of halo masses of our groups, we can still 
produce fairly pure samples by making a cut in either pm 
or Pma , or both, as shown in the third and fourth panels 
from the left. Interestingly pma actually does worse than 
Pm, but making a cut in pM and pma simultaneously 
produces a very pure sample of centrals at the cost of 
numbers (green curves). For instance, by making the 
simultaneous cut pM > 0.5 and pma > 0.5 in groups 
with > 5, we obtain a sample of about 100 centrals 
that are pure at the level of 80%, in the sense that 80% 
of the galaxies are indeed centrals. However, 10% of 
these are actually centrals of a different halo than that 
identified in the reconstructed group catalog, and so the 
purity defined in terms of being the central of a correctly 
identified group (i.e. with a 2WM) reduces to about 70% 
(dashed lines, see Tab. 9). 

In Figure 20 we show the central fraction fc only for 
those galaxies with a selection probability Psd > 0.1, 
where Psd was either pM or puA or the intersection of 
the two. For the remaining galaxies, the fraction fc is 
much lower than for the sample with psoi > 0.1, so this 
sample naturally consists mostly of satellites. The frac- 
tion of satellites /sat in this sample is shown in Figure 21 
as a function of additional selection by the association 
probability p. While the curves in Figure 20 do basically 
not depend on an additional selection in p, the fractions 
of satellites in Figure 21 are sensitive to a lower limit in 
p. On the other hand, the choice of psci is negligible for 
the fraction of satellites. 

The interpretation of Figure 21 is relatively straightfor- 
ward: if a galaxy has a very low psei but simultaneously 
a high association probability p, it should be a satellite. 
For a probability selection of p > 0.5 for groups with 
iV > 5, we obtain a sample of ~ 2000 galaxies of which 
about 80% are indeed satellites of a group (sohd lines), 
and 75% are satellites in the specific groups that we think 
they are in (dashed lines, see Tab. 9). 

We may want to simply try to classify all galaxies as 
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Figure 20. Fraction of centrals /c for different moclc galaxy samples in tfie redsfiift range 0.1 < z < 0.8. Upper panels: the left panel 
shows the fraction of centrals for the total flux-limited sample, and the second from the left the fraction for those galaxies (spec-z and 
photo-z) which are not associated to any group (i.e. p = 0). The third and the forth panel from the left show the fraction of central galaxies 
within our mock group population for two different richness classes, where the selection probability p^^i is Pma for the blue lines, pm fof 
the red lines, and the intersection of the two for the green lines. The solid lines correspond to the fraction of galaxies being the centrals of 
any group, while the dashed lines correspond to the fraction of correctly identified centrals of the corresponding specific real group. Lower 
panels: the number of mock galaxies in the selected samples. In all panels, the error bars indicate the standard deviation among the 24 
mock catalogs. For the points in the left two panels, the error bars are smaller than the size of the points. Note that only galaxies with 
Psei > 0.1 are shown. The fractions and numbers for galaxies with < Psd < 0.1 deviate much from the relatively constant curves shown 
here (cf. Fig. 21). Note that the difference between the solid and the dashed lines for groups with 2 < Af < 5 comes mainly from the 
uncertainty in the detection of pairs, and note that the numbers of centrals in the actual data are similar to those in the mock catalogs 
(cf. Tab. 9). 



either central or satellites. That is, one does not just 
produce subsamples of centrals or satellites with high 
purity, but the samples of centrals and satellites are com- 
plementary and add up to the flux-limited sample. For 
any such division we can compute the completeness as 
well as the purity for either centrals and satellites, where 
we are not interested in their specific group membership. 
Note that the purity for centrals and satellites are just 
given by fc and /sat , respectively, for the corresponding 
sample. For both samples, the purity and completeness 
arc anti-correlated, such that a high purity implies a low 
completeness and vice versa. Additionally, the purity of 
satellites will be anti-correlated with the completeness 
of centrals and vice versa. This is similar to optimizing 
the group-finding parameters to obtain an optimal group 
catalog (cf. Fig. 4 of K09). One has to tune the param- 
eters p, pm, etc., to find the best compromise between 
the completeness and purity of either sample. A sensible 
compromise for producing the satellite sample is select- 
ing galaxies by p > 0.1, pu < 0.5, and pma < 0.5, while 
all non-satellite galaxies constitute centrals (see Tab. 10). 
This yields a completeness and purity of centrals of 89% 



Table 10 

Completeness and purity for complementary samples of 
centrals and satellites in the redshift range 0.1 < z < 0.8 



Sample^ 


Centrals'^ 


Satellites'^ 




(Jompl. Purity 


Compl. Purity 


Spoc-z & Photo-z 


0.89 0.81 


0.45 0.62 


Spcc-z only 


0.93 0.84 


0.54 0.74 



^ All galaxies are subjected to a binary central-satellite clas- 
sification. 

^ Centrals are given by all non-satellite galaxies. 
Satellites are selected by p > 0.1, PM < 0.5, andpMA < 0.5. 



and 81%, respectively, and a completeness and purity of 
satellites of 45% and 62%, respectively. 

Since the spectroscopic 20k sample constitutes basi- 
cally an unbiased subsample of the total flux-limited sam- 
ple, we can restrict our study of centrals and satellites 
to the spectroscopic galaxies (once we have used photo-z 
objects to help classify them). In this case, the complete- 
ness and purity for centrals are 93% and 84%, respec- 
tively, and the completeness and purity for satellites 54% 
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Figure 21. Fraction of satellites /sat in the mock group popula- 
tion selected by pM < 0.1 and pma < 0.1 as a function of associ- 
ation probability > p. The two panels correspond to the richness 
classes as indicated and the galaxy samples are restricted to the 
redshift range 0.1 < z < 0.8. Upper panels: the solid lines refer 
to the fractions of selected galaxies to be satellites of any group, 
while the dashed lines correspond to the fractions of selected galax- 
ies to be satellites of the corresponding specific real group. Lower 
panels: the number of mock galaxies in the selected samples. In 
any panel, the error bars show the standard deviation among the 
24 mock catalogs. Note that the numbers of satellites in the actual 
data are similar to those in the mock catalogs (cf. Tab. 9). 

and 74%, respectively. Note that especially the statistics 
of the satellites have improved because the group mem- 
bership is much better constrained for the spec-z sample. 

The conclusion of this section is that by applying a 
selection of galaxies in p, pm, Pma, and N we can pro- 
duce samples of centrals and satellites of varying purity 
and size. As expected, the size of the sample decreases 
with increasing demands on purity. Different levels of 
purity can also be obtained at the cost of biases in halo 
mass. For instance, a very large set of highly pure cen- 
trals (83% pure) is obtained by excluding all galaxies 
that can possibly be associated with any detected group, 
but this obviously then excludes all centrals in the more 
massive halos we have detected. Dividing all objects in 
the flux-limited sample into centrals and satellites yields 
a set of centrals that is 81% pure and a set of satellites 
that is 62% pure. As with most other aspects of identi- 
fying groups at high redshift, the actual construction of 
samples must be carefully considered in the light of the 
scientific requirements. 

7. DISCUSSION 

In this section we summarize the main properties of our 
20k group catalog and comment on the general difficulties 
of producing high-quality group catalogs. 

The catalog that we have presented in this paper con- 
tains almost 1500 groups of which ~ 570 host three or 
more spectroscopic galaxies. Based on detailed analy- 
ses using realistic simulated mock group catalogs, about 
75% of the groups with three or more members should 
be real in the sense that they exhibit a one-to-one cor- 
respondence (i.e. 2WM) to real groups. The remainder 
are either fragmented, overmcrgcd, or entirely spurious. 
The overall purities and completenesses for these groups 



(even relative to only "detectable groups" and if we do 
not care about the nature of the group, i.e. IWM) are 
about 83%. For groups that host only two spectroscopic 
galaxies, the statistics are even slightly worse. Fortu- 
nately, for groups with more than two spec-z members 
these statistics arc basically independent of the observed 
spectroscopic richness N over a broad range of redshift 
and the number of groups as a function of N should be 
an unbiased tracer for the number of real groups. 

Given the work involved, this overall result might ap- 
pear disappointing. Even the relatively simple task of 
differentiating centrals and satellites is quite difficult, es- 
pecially if one wants to classify all galaxies. In the latter 
case, we get at best a completeness and purity of cen- 
trals of 93% and 84%, respectively, and of satellites 54% 
and 74%, respectively. Many problems have their origin 
in issues concerning the basic group catalog (e.g. over- 
merging, fragmentation). And yet these statistics are 
very good compared with other group catalogs at high 
redshift in the literature (e.g. Gerke et al. 2005; Guc- 
ciati et al. 2010; Gerke et al. 2012). So it is presum- 
ably just an unpleasant fact of life that the construction 
of high-quality group catalogs (at least when using only 
spatial galaxy information) is very difficult and subjected 
to several limitations. The reason for this is that groups 
in contrast to huge clusters and single galaxies exhibit 
by their nature a rather low-density contrast against the 
general field which makes them difficult to detect and suf- 
fer from problems which can hardly be cured (e.g. over- 
lapping groups in redshift space, interlopers in redshift 
space; see K09 for a discussion of difficulties in detecting 
groups) . 

Gan we do better? We discussed in this paper the 
exploitation of high-quality photo-z to compensate for 
the incompleteness of spec-z galaxies in our sample. It 
is very unlikely that one could detect new groups us- 
ing these photo-z that were not detected before with the 
spec-z, since even these high-quality photo-z have an un- 
certainty of 5z ^ 0.01(1 -|- z) which amounts to several 
times the extension of a group along the line of sight. 
However, the photo-z are quite useful in characterizing 
groups that are already detected. Especially big groups 
benefit much from the photo-z information, insofar as 
they improve the estimation of the group centers signif- 
icantly (< 100 kpc offsets) and prevent mistakes in as- 
signing most massive galaxies to groups. The inclusion 
of photo-z also allows unbiased estimates of the corrected 
richness iVcorr to an accuracy of ~30%. 

Would a 100% sampled spectroscopic survey with the 
same flux limit produce a "better" group catalog? While 
a higher sampling rate will find more groups of lower av- 
erage mass, the figures of merit such as gi, g2, etc. (see 
Sect. 3.1), will not improve substantially. These are de- 
fined relative to the groups that should have been de- 
tectable in the survey. This is seen in the small statis- 
tical differences between the 10k and the 20k group cat- 
alog (see Fig. 3) and also in the differences between the 
full and the central region of the 20k sample (see Tab. 3). 
We also performed tests with complete flux limited mock 
samples, which also suggest that the gain in these statis- 
tics would only improve a couple of percent. We would 
flnd more groups for a given observed richness N, but at 
any richness N the basic problems in detecting groups 
(e.g. the overlapping of groups in redshift space, low- 
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density groups, interlopers in redshift space, etc.) would 
remain. This expectation is also shown by a comparison 
with the FOF group catalog from the highly complete 
GAMA survey, whose statistics were also obtained by 
comparison with the Millennium simulation (Robotham 
et al. 2011) and are broadly comparable, as far as we can 
determine from their paper. For example, their reported 
fraction of 2WM reconstructed mock groups of 77% is 
not substantially better than our value of ^ 75% in Fig- 
ure 6. The construction of high-quality group catalogs 
is presumably subject to limitations that arc intrinsic to 
the nature of groups and not so sensitive to the details 
of the spectroscopic survey. 

8. CONCLUSION 

In the first part of this paper, we have presented 
the construction and properties of the zCOSMOS 20k 
group catalog. The basic catalog was derived by apply- 
ing an FOF multi-run algorithm, whose parameters were 
tuned by realistic simulated mock galaxy catalogs, on the 
~ 16,500 high quality-spectroscopic redshifts of the 20k 
zCOSMOS sample. 

The catalog contains 1498 groups in total and 192 
groups with more than five spectroscopic members. If 
pairs are excluded, its one-way completeness is as high as 
83%, and its one-way purity 82% compared to all groups 
principally observable within the 20k sample. About 75% 
of these groups exhibit a 2WM (i.e. a one-to-one corre- 
spondence) to real groups. These statistics are robust 
over essentially the whole range of richness, above three 
or more members, and across the whole redshift range. 
The fraction of spectroscopic galaxies that can be asso- 
ciated to a group decreases from about 35% to 10% over 
the redshift range from z~OtO0~l. A prominent fea- 
ture of the catalog is that the number of reconstructed 
groups traces very accurately the number of real groups 
for all richnesses. 

Comparisons of the 20k group catalog with the 24 mock 
catalogs obtained from the DM Millennium simulation 
exhibit some similarities, but also some differences. The 
number of groups in the 20k catalog arc well within the 
error bars of the number of reconstructed mock groups 
over a broad range of observed richnesses. However, 
there are too many small groups with N = 2-3 and too 
few large groups with > 20 in the zCOSMOS group 
catalog compared to the mock catalogs. This could be 
an indication that the erg of the Millennium simulation 
is in fact too large compared to the actual universe in 
agreement with the latest cosmological measurements. 
The fraction of galaxies in groups for the total catalog 
shows fair agreement with the mock catalogs except for 
z > 0.7 where the fraction is significantly higher for the 
actual data. On the other hand, particularly at high 
redshift and for volume- limited samples, there arc appar- 
ently more galaxies in groups than expected from mock 
catalogs. 

We do detect clear evidence for the growth of cosmic 
structure over the last seven billion years because the 
fraction of galaxies that are found in groups (in volume- 
limited samples) decreases significantly to higher red- 
shifts. 

In the second part of this paper, we have developed 
a scheme for complementing the group population by 
those galaxies which have no reliable spectroscopic red- 



shift, but only a photometric one. This was achieved by 
assigning to all photo-z galaxies a mock-calibrated as- 
sociation probability p for being a member of a given 
group. With the aid of the mock catalogs we studied the 
fidelity, distribution, and completeness of photo-z galax- 
ies associated to groups and found that the concept works 
comparably well for the actual data. 

Using the flux limited group population and the mem- 
bership probabilities, we introduced a probability pm for 
each galaxy to be the most (stellar) massive of a group. 
We found that, for the actual data as well as for the mock 
data, most of the groups of any richness have a clear well 
defined candidate for their most massive galaxy. The 
fidelity of pm, however, depends sensitively on the mea- 
surement errors of the stellar mass. Despite this problem, 
selecting galaxies with pM > 0.7 yields a success rate of 
finding the real most massive galaxies in more than 50% 
of cases. 

As another application of the membership probability, 
using the mock catalogs we studied ten estimators for lo- 
cating the spatial centers of the groups, of which four are 
based only on spec-z information and six on a combina- 
tion of spec-z and photo-z. We found that all estimators 
typically depend on both spectroscopic group richness N 
and projected apparent extension of the group. Typi- 
cally, the higher A'' and the more extended a group, the 
more effective is the consideration of the photo-z infor- 
mation. Also weighting the galaxy position by the in- 
verse of their projected Voronoi area is more effective 
in high-A groups. Wc found that the combination of 
weighting galaxies by their inverse Voronoi areas and by 
their stellar mass is superior than just using one of these 
weighting schemes alone. We define "improved centers" 
by a combination of these estimators (without using in- 
formation from stellar mass) which should yield offsets 
< 100 kpc from the deepest point of the potential well for 
most groups of any richness class and group extension. 
According to the mock catalogs, by considering stellar 
mass even smaller offsets arc achievable. 

The best of the 10 estimators achieves the successful 
selection of the galaxy at the potential minimum (not to 
be confused with the most massive galaxy) for at least 
half of all mock groups with A > 5, and for 75% of all 
groups it yields offsets of less than 20 kpc from the real 
group center. 

Finally, we investigated the question how well we can 
define galaxy samples of central and satellite galaxies, 
where the centrals are defined to be the galaxies lying at 
the minimum of the gravitational potential. In addition 
to Pm we introduced another probability, pma, which 
includes beside the stellar mass also information from 
the local density at the position of the galaxy. While for 
picking the central galaxy of a group pM and pma work 
comparably well; they are most powerful when taking in 
combination. We found that by applying suitable cuts 
in p, Pm, and pma, we are able to construct fairly pure 
samples of either centrals or satellites (typically about 
60%-80% purity depending on the richness of the group). 

If we want to classify all galaxies in a binary way as ei- 
ther centrals or satellites, we can compute the complete- 
ness as well as purity for either centrals and satellites. 
We defined a division such that for the total flux-limited 
sample the completeness and purity of centrals are 89% 
and 81%, respectively, and of satellites 45% and 62%, 
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respectively. By constraining this division to the spec- 
troscopic sample, the completeness and purity of the cen- 
trals are 93% and 84%, respectively, and of satellites 54% 
and 74%, respectively. 

This research was supported by the Swiss National 
Science Foundation, and it is based on observations un- 
dertaken at the European Southern Observatory (ESO) 
Very Large Telescope (VLT) under the Large Program 
175.A-0839. 
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